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The present paper is limited to one 
phase of a continuing controversy, 
that of ‘“‘psychic”’ phenomena. Within 
this general field, there is the rubric 
concerned with dice throwing and re- 
lated tests designated as “psycho- 
kinesis’”’ (PK). In everyday language, 
if ESP represents “mind to mind,” 
PK refers to ‘“‘mind to matt:-r.” 
Thus Rhine and Pratt (1957, p. 7) 
speak of two main subdivisions of 
parapsychology : extrasensory precep- 
tion and PK. For them, PK is ‘‘The 
direct influence exerted on a physical 
system by a subject without any 
known intermediate physical energy 
or instrumentation” (p. 209). In pre- 
paring this review, more than 200 
publications have been examined, 
which deal with one phase or another 
of PK or its hypothetical relationship 
to ESP. And it is not possible because 
of limitations in space to review in 
detail every report. Every attempt 
has been made to insure complete 
coverage of the PK data reports, the 
first of which appeared in 1942. In an 
area saturated with controversy, it 
should be expected that a review con- 
fined to PK will be considered by 
many to be too restricted. For some 
holding this view, the definitive proof 
of ESP has been obtained, whereas 
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1 Gratefully, acknowledgement is made for 
the Thomas Welton Stanford Fellowship 
(Stanford University) and the John Simon 
Guggeheim Fellowship which made possible 
this effort. The present paper constitutes one 
phase of a larger study of the process of contro- 
versy in scientific endeavors. 


PK is still controversial. Some others 
would hold that definitive proof of a 
“‘qualitative’’ or ‘ na- 
ture existed prior to the establish- 
ment of the Duke Parapsychology 
Laboratory. 

There are a number of reasons to 
justify a review devoted specifically 
to PK. After study of the published 
data and discussions with interested 
parties here and abroad, it seems 
clear that all of the issues which have 
been raised with respect to ESP also 
appear in connection with PK. The 
topic constitutes a unit which can be 
considered within the limits of a 
single publication. It is an area with 
which the academic psychologist is 
generally unfamiliar. Although criti- 
cism of some of the PK reports, espe- 
cially the earlier dice tests, have ap- 


‘spontaneous”’ 


peared, there has appeared no assess- 
ment of psychokinesis as a whole. 
PK constitutes a controversy 
within a controversy, with different 
positions taken concerning the reality 
of one or another aspect of psychic 
research. On the one hand, there is 
the assumption by Rhine as well as 
some other believers that PK and 
ESP are related. For Rhine (1944a), 
“The proof that the mind is extra- 
physical in nature does not rest, how- 
ever, on the ESP work alone; it has 
received powerful confirmation from 
many of the PK researches which 
have especially borne upon this issue”’ 
(p. 250). Rhine (1947) states that: 
“The most revealing fact about PK 
is its close tie-up with ESP [p. 
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120)... PK implies ESP, and ESP 
implies PK” (p. 129). Murphy 
(1952a) noted that ‘The law of the 
decline curve [reduction in scoring 
rate for designated targets] which we 
quoted earlier in relation to ESP 
holds in PK, as Dr. Rhine had shown 
earlier, just as it does in ESP”’ (p. 57). 
In a review of the field, no mention 
was made of PK (Murphy, 1958) but 
most recently it was concluded 
(Murphy & Dale, 1961) that ‘‘the 
thoughtful modern reader can no 
longer slam the door on _ psycho- 
kinesis’”” (p. 182). Thouless and 
Weisner (1946) extended the use of 
the term “psi” so as to include PK. 
For Heywood (1959) the convincing 
evidence comes from McConnell, 


Thouless, and Fisk (cf. Table 2). 
Eccles (1951), well known in neuro- 
physiology, has proposed a cortical 
theory incorporating ESP and PK 
into brain physiology. The PK stud- 
ies were interpreted as evidence for 
the reality of a ‘‘force’’ that was pre- 


sumed to have made it possible for a 
famous Italian medium in 1908 to 
cause a stool to rise (Ducasse, 1951). 

On the other hand, Flew (1953) 
“‘must confess to almost invincible 
incredulity” (p. 104). Eysenck 
(1958), who accepts ESP, stated that 
with respect to Rhine’s PK data ‘‘we 
should be cautious in accepting these 
results until they have been dupli- 
cated successfully elsewhere’ (p. 
140). In what has been called by 
McConnell (1954) the most impor- 
tant book on parapsychology since 
“ESP-60” (Rhine, Pratt, Smith, 
Stuart, & Greenwood, 1940), Soal 
speaks of PK as an “alleged”’ effect 
(Soal & Bateman, 1954, p. 360). 
Elsewhere, Soal (1948) noted that 
“Dr. Rhine’s book [Reach of the Mind] 
certainly merits ‘remarkable’ in more 
senses than one”’ (p. 185). McConnell 
(1948) commented that Soal ‘“‘ac- 
cepts telepathy but not PK. ... I do 


EDWARD GIRDEN 


not understand the type of mind 
which is bold enough to defy orthodox 
science by accepting telepathy and 
yet is so timid as to deny psycho- 
kinesis when the evidence for the 
latter is rather better than for the 
former” (pp. 242 f.). West (1945) first 
judged that “Without calling the 
experimenters liars, the case for PK 
does not seem to be challengeable; it 
is probably even more clear-cut than 
the case for ESP itself’? (p. 290). 
Some years later, however, West 
(1954b) judged that “Further re- 
search is needed before it can be ac- 
cepted as an established concept on 
par with ESP. ... There is nothing 
definite to connect PK with the very 
palpable forces associated with physi- 
cal mediums” (p. 115). These differ- 
ences among those committed to the 
field as well as the agnosticism of 
many academic psychologists indi- 
cate a need for a thorough review of 
the area. 


PK HyporHeEsis 


There would appear to be no ques- 
tion that the hypothesis, however 
vaguely formulated, behind the Duke 
Parapsychological Laboratory dice 
throwing tests was psychological in 
nature. With respect to the Early 
Dice tests (cf. Table 1), Rhine (1944b) 
concluded that “Now in 1944, ten 
years later, the position to which one 
is driven by the results of these dice 
throwing studies is this: There is a 
direct psychical effect on the fall of the 
dice... and may be termed psycho- 
kinesis, or PK”’ (p. 190 f.). Elsewhere, 
he says “Psychokinesis is accordingly 
produced by no mere blind and pur- 
poseless force . . . PK reacts with the 
physical object according to intelli- 
gent design and direction’ (Rhine, 
1947, p. 117; also cf. Rhine, 1957). 

In its simplest terms, the experi- 
mental design of the PK tests was 
based on wishing for specified targets 
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in throwing a given number of dice. 
The significance of the results was a 
function of the number of target 
“hits’’ beyond ‘‘chance.”’ Speaking 
of the 18 studies upon which the case 
for PK was based, it is stated that 
(Rhine & Pratt, 1957) all “involved 
the same essential operation, the sub- 
ject’s [.S’s] conscious effort to influ- 
ence the fall of dice so as to make a 
specified face or combination of faces 
turn up” (p. 60). This ostensibly 
implies that PK is indicated by ob- 
taining target hits. But a postmor- 
tem evaluation of these data sug- 
gested a second criterion: an ‘‘extra- 
chance” decline in the number of 
hits for specified targets (cf. Rhine, 
1957; Rhine & Humphrey, 1944a; 
Rhine & Pratt, 1957). For McCon- 
nell (McConnell, Snowdon, & Powell, 
1955), “the evidence for psychokinesis 
rests on two statistical effects. These 
are the total deviation from chance 
expectation of wished-for die faces, 
and the occurrence of extra-chance 
declines in scoring rates for these 
faces” (p. 269). Although strictly 
speaking hit-score and decline in 
scoring are indices, the PK hypo- 
thesis, for the present purpose, will 
be identified with these two effects. 
With respect to experimental de- 
sign, occasionally, an independent 
variable was employed. Thus if the 
variable is to be ‘‘wishing”’ for given 
die faces to appear (i.e., ‘target 
hit’’), an adequate experimental test 
is most readily devised by comparing 
the scores obtained in wishing versus 
nonwishing sequences. If, however, 
there is no independent variable, the 
experimental design rests upon a 
significant statistical departure from 
the Probability Model. There are at 
least two important considerations 
here. First, there is the necessity of 
adequately controlling all known 
factors. Secondly, there is the as- 
sumption that the unknown (but 
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knowable) factors are randomly dis- 
tributed. Then, a significant devia- 
tion from chance expectancy is taken 
as evidence for extranormal phe- 
nomena. There are obvious experi- 
mental dangers in such dependency 
upon a Probability Model. The diffi- 
culty in applying an operational defi- 
nition based on it to paranormal is 
not to be minimized (cf. Boring, 
1955). This is undoubtedly another 
basis for the controversy with respect 
to ESP as well as PK. 

In addition to the dice throwing 
tests (Tables 1 and 2) two other cate- 
gories of data are reported in support 
of the PK hypothesis: e.g., throwing 
of objects other than dice (Table 3) 
and finally tests in which thrown (or 
released) objects are intended to rest 
in given target areas (Table 4). The 
present analysis is thus divided into 
four sections: (a) the early pioneer 
and (0) later phase in dice throwing 
tests; (c) wishing with objects other 
than dice, and (d) placement series in 
which the released objects are re- 
quired to land in a specified area of 
the target table. Certain other re- 
ports have been omitted from con- 
sideration here: e.g., wishing with 
plants (cf. Loehr, 1959; Vasse, 1950; 
Vasse & Vasse, 1948) and paramecia 
(Richmond, 1952). Similarly, a 
number of reports with animals (Osis, 
1952) have been excluded. It would 
be a fruitless exercise to determine 
whether such reports belong in PK or 
ESP. Since, from the published 
record, reports are not pre- 
sented as crucial to the case for PK, 
detailed treatment of these data is 
not pertinent here. 

Some commonly emploved terms 
should also be noted. A hit would be 
the appearance of a given die face 
when it had been designated as tar- 
get. With more than one die per 
throw (e.g., 2-96), the score would be 
the total number of dice showing the 


these 
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wished-for target face. H(igh) targets 
are combined face readings of eight 
or higher with two dice per throw or 
readings on a pair of successive 
throws with a single die (Hilton, 
Baer, & Rhine, 1943). Conversely, 
Lows would be scores of 2 to 6 inclu- 
sive. ‘‘7’s’’ isa combined score witha 
pair of dice totaling 7 in any combi- 
nation, such as 1 and 6, 3 and 4. In 
Reeves and Rhine (1945) however, 
score was for “doubles,” i.e., both 
dice had to appear on six-face to be 
considered a hit (Table 2). Targets 
given as 1-6 indicate that each face 
served as target, although not neces- 
sarily for an equal proportion of the 
total number of trials. A run consists 
of 24 die throws, in 24 single die 
throws or one throw of 24 dice. Thus 
a single casting of 96 dice would be 
tallied as 4 runs. Throughout, there 
will be reference to ‘declines’ in 
which significance is attached to a 
decline in the rate of scoring for desig- 
nated targets. This concept will be 
treated specifically in the section on 
Declines. 


Dice TEsTs 
Early Dice Tests? 


The case made for PK is based on 
the first 21 reports in Table 1, data 


* It is to be noted that there is a gap of 6-9 
years between the collection of these data 
(1934-37) and their first publication in 1943. 
One can appreciate the delay in publishing 
these reports “to avoid any publicity about 
our PK work. At the time, 1934-37, the storm 
of controversy was rising over the ESP re- 
search reported in 1934, and we thought it 
best to withhold the PK work until that sub- 
sided” (Rhine, 1947, p. 104). Concerning the 
controversy over the early card-guessing re- 
ports, see ‘The ESP Symposium at the 
A.P.A."” (Anonymous, 1938), Kennedy's 
(1939) critique, ‘“ESP-60"" (Rhine et al., 
1940), and the Ciba symposium (Wolsten- 
holme & Millar, 1956). It is a speculative 
fantasy as to what would have been the fate 
of the early dice reports if they had been sub- 
mitted for publication to Rhine’s own journal 
during 1939-41. During this time, G. Murphy 


which were collected during 1934-37 
and published in the Journal of Para- 
psychology 1943-46. Two reports by 
Nicol and Carington (1947) and 
Nash (1944) belong in this period 
since they were carried out, respec- 
tively, in 1934-36 and 1940, inde- 
pendently of Rhine. Excluding the 
negative McDougall Series, the first 
19 studies in Table 1 constitute a very 
impressive score. There was a total 
of +6,515 target hits (i.e., appear- 
ance of wished-for designated target 
faces). The total number of runs 
given in the published reports exceeds 
that given here, since some unwit- 
nessed runs and control series were 
excluded. But these constitute minor 
discrepancies between the present 
tabulation and that given by Rhine 
and Humphrey (1944a, p. 28) which 
also included some data from the 
Later Dice tests. An overall sum- 
mary of the test conditions and the 
main findings are given in Table 1. 


In the simplest terms, the critical 
question is whether this outcome was 
the result of wishing or any other 
psychological (and/or psychic) vari- 


able. And was the subsequently re- 
ported decline in scoring also psycho- 
logical in origin? In terms of conven- 
tional scientific practice, an answer 
to these questions is in part a func- 
tion of the conditions under which 
the results were obtained. These con- 
siderations will be evaluated as they 
are relevant: some with respect to 
the present material, i.e., Early Dice 
and B. F. Riess served as editors and were sup- 
ported by an advisory committee of eminent 
psychologists. The editors had accepted 
Rhine's invitation provided that (Murphy & 
Riess, 1939) “‘Emphasis is to be on consistent 
technical reporting with very detailed ac- 
counts of experimental and statistical method” 
(p. 1). The exchanges during these 3 years 
between the Advisory Board, authors and 
Rhine, and the latter’s conception of the 
responsibilities of an Advisory Board to a 
scientific journal are illuminating. 
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TABLE 1 


Earty Dice TEstTs 


References E Ss 
Rhine & Rhine* (1943) 25 
Reeves & Rhine | (1943) | Reeves | 1 (E) 
Gibson & Rhine | (1943) Gibson | 1 (w) 
Hilton, Baer & Rhine | (1943) H&B | 2 (Es) 
Hilton & Rhine (1943) | Hilton | 3 
Rhine* (1943) | | 3 
Gibson, Gibson, & | | 
Rhine | (1943) | Gibson 1 (w) 
Rhine & Humphrey (1943) | McDougall | 9 
Gibson, Gibson, & | | 
Rhine (1944) Gibson | 2(E+w) 
Rhine & Humphrey (1944)° | Frick | 2(E+w) 
Price & Rhine * |} (1944) | Woodruff | 1 (E) 
Rhine | (1944)> | Smith® | 1(E) 
Humphrey & Rhine* | (1945) Woodruff $ 
Reeves & Rhine | (1945) Reeves® } 1(E) 
Rhine* (1945) | | 7 
Rhine & Humphrey | (1945a)>| Frick® | 1(E) 
Rhine & Humphrey* | (1945b)>} 26 
Averill & Rhine® | (1945) | | 2 
Rhine, H., & Averill* | (1945)> | | 4 
Rhine* , | (1946a)> | 5 
Woodruff & Rhine* | (1942) | 1 (E) 
Nicol & Carington } (1947) Nicol 23 
Nash | (1944) | 113 
? 3) 
| 2 (Es) 





Throw | Target 


Note.—Unless otherwise indicated, author is £; w under Ss is 


Dice throw is by b (box release), c (cup), h (hand 


of 24 die throws, regardless of number of dice/throw. Deviation 


indicates C(ritical) R(atio) is insignificant. 
*® Duke Report 
> Minor Report. 
° R, as S, working alone. 


tests; others in later sections of this 
analysis. 
Systematic Procedures 

Ignoring for the moment, the basic 
consideration of experimental design, 
it is clear that these tests were largely 
free-wheeling and off the cuff. Varia- 
bility of test conditions was a com- 
mon occurrence. Whatever other 
weaknesses result from such prac- 
tices, replication of test conditions is 
impossible. When Reeves was work- 
ing solo at home, ‘‘For the most part, 
an equal number of runs [were] made 
on high and low dice each day” 
(Reeves & Rhine, 1943, p. 80). In 
Hilton, Baer, and Rhine (1943), Ss 
were permitted to throw one or a pair 
of dice, and, also in Hilton and Rhine 
(1943), to choose a pair of dice from 
any of three pairs of different sizes. 
In the first of these two Hilton re- 
ports, it was also reported that there 
was “‘no optional stopping”’ since the 


Dice 


> Die 
throw Runs throws tion CR 
h,c,m| H 2 562 300 7.40 
h H 2 492 290 7.65 
r 6 6 90 66 3.81 
h,m H 1,2 484 130 3.46 
‘ | H 2 824 243 | 4.95 
cM | 6 2 110 64] 3.34 
| ' 

< 1-6 6 6,033 1,057 | 7.46 
h 1 i | 269 - | 

M 1-6 | 1,491 191 | 4.13 
‘ 1-4 24 2,292 320 3.67 
M 6 2 60 58 4.10 
h 6 2,6 629 201 4.39 
M 6 2 12¢ 69 3.37 
h 6s 2 309 28 2.56 
h, M s 2 20 66 | 3.57 
‘ 6 60 2,172 82 6.84 
c 6 6 7 6 2,541 15.79 
b | 6 | 96 240 93 3.29 
b | 6 96 680 216 47.59 
b 6 96 480 

c,M 1-6 1 70 73 4.78 

16 | 1 139 , 396 — 
1 1,6 8,1 
Control) 6 18,144 
wife of E 
, m (mechanical release), and M (machine-tumbled). Each run =total 
is number of (target) hits beyond chance expectancy. Blank 


tests were terminated by the close of 
the semester.« In the Gibson series, 
there were no established routines, 
nor was the order of target faces re- 
ported, although ‘‘at one time or an- 
other, they threw for each of the 6 
faces’’ (Gibson, & Rhine, 
1943, p. 229). In this test, some runs 
were obtained with the experimenter 
(EZ) and some other individuals as Ss. 
In the later study (Gibson, Gibson, & 
Rhine, 1944) in addition to the two 
Gibsons who served as Ss, 429 miscel- 
laneous runs were obtained with 
other Ss. In the first Frick series, 
ostensibly all die faces were used as 


Gibson, 


targets but the six-face was chosen 
for about 75% of the throws, with 
complete ‘freedom allowed S”’ (Rhine 
& Humphrey, 1944c, p. 142). In the 
Smith solo, there were no established 
routines, one-sixth of the runs were 
obtained with another S and optional 
stopping occurred. Faces 1, 2, and 4 
served as targets for 97 runs (Rhine 
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1944b). In the Humphrey and Rhine 
(1945) all Ss preferred and used ex- 
clusively the six-face as target. In 
general, there was considerable-varia- 
tion in target designation within the 
same test. Ss would be free to choose 
any desired face as target. When 
slumps would occur, £ might suggest 
rate or manner of throwing be varied. 
For the most part, the six-face was 
the predominantly chosen target. In 
general, informality was the rule; a 
well designed, rigorously executed 
test was the exception. 


Recording Errors 


Little or no effort was made to in- 
sure accuracy in recording the ob- 
tained scores. It might be unfair to 
have required in these early tests 
completely objective records, e.g., 
photographic recording of all die 
faces on all throws. Certainly, it 
would not have been unreasonable to 
require two independent recorders 
for all die throws (cf. Dale, 1946; 
Dale & Woodruff, 1947). On the 
other hand, records made by a single 
E, aware of target designations, and 
especially those instances in which 
unwitnessed E was also S, give little 
confidence in the data. As noted in 
Table 1, four tests were entirely solo 
efforts. The hazard in the solo efforts 
of recording what is anticipated 
(wished-for target) is most critical. 
This difficulty is not minimized if 
only a small number of dice are cast; 
e.g., throwing one die or a pair of dice. 
In this circumstance it has been 
noted that “if there are no hits at a 
given throw, the recorder may tend 
to make no entry.’ 

Questions of accuracy of recording 
are equally serious in those cases in 
which many dice were involved. As 
indicated in Table 1, in the first Frick 


* PD. Parsons, 
August 1961. 


personal communication, 
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series there were 24 dice per throw; in 
three other Duke exploratory series 
there were 96 dice to be counted on 
each box release. In the alcohol series 
(Averill & Rhine, 1945) each author 
ingested a cocktail concocted of 100 
cc of gin and an equal amount of a 
soft drink, during which one S “‘lost”’ 
the drink during the experiment and 
JBR being able to retain it only until 
the tests had been completed. A 
third person present functioned as Z 
to record hits. Averill and Rhine 
served again as Ss 2 days later in the 
caffeine study (Rhine, Humphrey, & 
Averill, 1945), the results of which 
are considered such as “‘to render any 
future discussion of the chance hy- 
pothesis unnecessary” (p. 87). It was 
noted Averill and Rhine ‘‘did not feel 
as alert as usual’’ (p. 81) and again 
the third person served as E. A com- 
bined or changing role in the experi- 
ment may interfere with reliable 
recording. With Woodruff as S, 
Price tried to distract him in‘‘friendly 
rivalry’’ (Price & Rhine, 1944, p. 180) 
and also record the hits. In some 
tests, two individuals served, respec- 
tively, in roles of recorder and ob- 
server; yet frequently, these individ- 
uals also served as Ss (e.g., in Table 1, 
the three Gibson series, and the First 
Frick report). 


Pooling of Data 


The practice of pooling of data ob- 


tained under different conditions, 
statistically or experimentally, is not 
a defensible procedure. This was 
manifestly apparent in Rhine and 
Rhine (1943). There is a set of data 
in addition to that reported in Table 
1, for Mrs. Reeves working at home 
alone. With low dice as target, on 
435 runs the average score was 5.21 
(5.00 =chance expectations) or +92 
hits, and a reported, barely signifi- 
cant, CR of 2.58 (Reeves & Rhine, 
1943, p. 81). The authors, combining 
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both sets of data, report an excess of 
H and L hits of 382 (CR=7.34). In 
Hilton, Baer, and Rhine (1943), two 
graduating seniors interchanged as S 
and E and were free to cast one or 
two of three pairs of different sized 
dice. Of the total of 484 runs, three- 
fourths were hand-thrown and the 
scores on the remainder (mechani- 
cally released) were a little better. 
Although it was stated that “optional 
stopping”’ was not involved, the tests 
were concluded when the Ss were 
graduated. The total deviation of 
+130 hits (484 runs) was a pooled 
score of trials with different sized 
dice, thrown in combinations of one 
and two dice, manually and mechani- 
cally. This total was reported signifi- 
cantly better than a nonwishing con- 
trol series of 128 runs (+4 hits) with 
medium sized dice alone. As Nicol 
has emphasized, in the Ciba sym- 
posium, the difference in score be- 
tween the control series and that in 
the main series obtained with the me- 
dium sized dice alone is statistically 
insignificant, ‘‘a modest 0.27” (Wol- 
stenholme & Millar, 1956, p. 34). 
More important, no distribution is 
given of the proportion of single die 
and two dice throws, mechanically 
released or manually thrown, sizes of 
dice, for each (or both) of the Ss. 
Distributions of scores—as High 
targets (8’s or higher) or otherwise— 
are totally lacking. 

The data in Table 1 for Hilton and 
Rhine (1943) are pooled results for 
Hilton, his sister, and his brother-in- 
law. It is admitted tht “In terms of 
controls, this research is less com- 
plete than is commonly the case’”’ (p. 
204). The data in Table 1 for the 
Gibson Large Cup Series were con- 
tributed 14% by Gibson, 72% con- 
tributed by his wife, and the re- 
mainder by some 12 friends (Gibson 
et al., 1943). The social factor test 
was a comparison of scores by Wood- 
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ruff working alone and then throwing 
on the following day ‘‘heckled’’ by 
Price with an additional 20 runs ob- 
tained some 2 weeks later (Price & 
Rhine, 1944). The Humphrey and 
Rhine (1945) data are pooled results 
of two sets of 63 runs each, each of 
which was obtained with a pair of 
different sized dice. 


Control Tests 


The purpose of these tests was to 
establish the role of wishing for speci- 
fied die faces (targets) to appear 
when the dice are cast. In Hilton, 
Baer, and Rhine (1943), for the con- 
trol test with medium sized dice (not 
included in Table 1), Ss were in- 
formed that this was to test the laws 
of chance; “this was done to prevent 
their attempting to influence the 
dice”’ (p. 175). In this series of 128 
runs, the average fell to 5.03, giving 
an overall chance deviation of +4 
hits. However, if one compares this 
control series with those runs of the 
main series made with the medium 
sized dice, the difference is insignifi- 
cant, as Nicol has noted in the Ciba 
symposium (p. 34). 

The study of the ‘social factor’ 
compared results obtained by Wood- 
ruff working (and recording) alone, 
with the results obtained when he was 
distracted by Price (who acted as 
recorder) and those occurring in the 
normal (unheckled) situation with 
neutral observers present (Price & 
Rhine, 1944) obtained the preceding 
week. The last data, of course, fail to 
constitute an acceptable postexperi- 
mental control test. The “Minor 
Studies” involving pharmacological 
variables violate the simplest require- 
ments of control studies, lacking pre- 
and postcontrol tests and ‘‘placebo”’ 
tests (Averill & Rhine, 1945; Rhine 
et al., 1945). Equally inadequate are 
the data obtained with hypnosis; e.g., 
design modifications were introduced 
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1944b). In the Humphrey and Rhine 
(1945) all Ss preferred and used ex- 
clusively the six-face as target. In 
general, there was considerable varia- 
tion in target designation within the 
same test. Ss would be free to choose 
any desired face as target. When 
slumps would occur, E might suggest 
rate or manner of throwing be varied. 
For the most part, the six-face was 
the predominantly chosen target. In 
general, informality was the rule; a 
well designed, rigorously executed 
test was the exception. 


Recording Errors 


Little or no effort was made to in- 
sure accuracy in recording the ob- 
tained scores. It might be unfair to 
have required in these early tests 
completely objective records, e.g., 
photographic recording of all die 
faces on all throws. Certainly, it 
would not have been unreasonable to 
require two independent recorders 
for all die throws (cf. Dale, 1946; 
Dale & Woodruff, 1947). On the 
other hand, records made by a single 
E, aware of target designations, and 
especially those instances in which 
unwitnessed E was also S, give little 
confidence in the data. As noted in 
Table 1, four tests were entirely solo 
efforts. The hazard in the solo efforts 
of recording what is anticipated 
(wished-for target) is most critical. 
This difficulty is not minimized if 
only a small number of dice are cast; 
e.g., throwing one die or a pair of dice. 
In this circumstance it has been 
noted that “if there are no hits at a 
given throw, the recorder may tend 
to make no entry.’” 

Questions of accuracy of recording 
are equally serious in those cases in 
which many dice were involved. As 
indicated in Table 1, in the first Frick 


*D. Parsons, 
August 1961. 


personal communication, 
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series there were 24 dice per throw; in 
three other Duke exploratory series 
there were 96 dice to be counted on 
each box release. In the alcohol series 
(Averill & Rhine, 1945) each author 
ingested a cocktail concocted of 100 
cc of gin and an equal amount of a 
soft drink, during which one S “‘lost’”’ 
the drink during the experiment and 
JBR being able to retain it only until 
the tests had been completed. A 
third person present functioned as E 
to record hits. Averill and Rhine 
served again as Ss 2 days later in the 
caffeine study (Rhine, Humphrey, & 
Averill, 1945), the results of which 
are considered such as “‘to render any 
future discussion of the chance hy- 
pothesis unnecessary” (p. 87). It was 
noted Averill and Rhine “did not feel 
as alert as usual” (p. 81) and again 
the third person served as E. A com- 
bined or changing role in the experi- 
ment may interfere with reliable 
recording. With Woodruff as S, 
Price tried to distract him in“‘friendly 
rivalry” (Price & Rhine, 1944, p. 180) 
and also record the hits. In some 
tests, two individuals served, respec- 
tively, in roles of recorder and ob- 
server; yet frequently, these individ- 
uals also served as Ss (e.g., in Table 1, 
the three Gibson series, and the First 
Frick report). 


Pooling of Data 


The practice of pooling of data ob- 


tained under different conditions, 
statistically or experimentally, is not 
a defensible procedure. This was 
manifestly apparent in Rhine and 
Rhine (1943). There is a set of data 
in addition to that reported in Table 
1, for Mrs. Reeves working at home 
alone. With low dice as target, on 
435 runs the average score was 5.21 
(5.00 =chance expectations) or +92 
hits, and a reported, barely signifi- 
cant, CR of 2.58 (Reeves & Rhine, 
1943, p. 81). The authors, combining 








PSYCHOKINESIS (PK) 


both sets of data, report an excess of 
H and L hits of 382 (CR=7.34). In 
Hilton, Baer, and Rhine (1943), two 
graduating seniors interchanged as S 
and £E and were free to cast one or 
two of three pairs of different sized 
dice. Of the total of 484 runs, three- 
fourths were hand-thrown and the 
scores on the remainder (mechani- 
cally released) were a little better. 
Although it was stated that “‘optional 
stopping”’ was not involved, the tests 
were concluded when the Ss were 
graduated. The total deviation of 
+130 hits (484 runs) was a pooled 
score of trials with different sized 
dice, thrown in combinations of one 
and two dice, manually and mechani- 
cally. This total was reported signifi- 
cantly better than a nonwishing con- 
trol series of 128 runs (+4 hits) with 
medium sized dice alone. As Nicol 
has emphasized, in the Ciba sym- 
posium, the difference in score be- 
tween the control series and that in 
the main series obtained with the me- 
dium sized dice alone is statistically 
insignificant, ‘‘a modest 0.27” (Wol- 
stenholme & Millar, 1956, p. 34). 
More important, no distribution is 
given of the proportion of single die 
and two dice throws, mechanically 
released or manually thrown, sizes of 
dice, for each (or both) of the Ss. 
Distributions of scores—as High 
targets (8’s or higher) or otherwise— 
are totally lacking. 

The data in Table 1 for Hilton and 
Rhine (1943) are pooled results for 
Hilton, his sister, and his brother-in- 
law. It is admitted tht ‘In terms of 
controls, this research is less com- 
plete than is commonly the case”’ (p. 
204). The data in Table 1 for the 
Gibson Large Cup Series were con- 
tributed 14% by Gibson, 72% con- 
tributed by his wife, and the re- 
mainder by some 12 friends (Gibson 
et al., 1943). The social factor test 
was a comparison of scores by Wood- 
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ruff working alone and then throwing 
on the following day ‘‘heckled’’ by 
Price with an additional 20 runs ob- 
tained some 2 weeks later (Price & 
Rhine, 1944). The Humphrey and 
Rhine (1945) data are pooled results 
of two sets of 63 runs each, each of 
which was obtained with a pair of 
different sized dice. 


Control Tests 


The purpose of these tests was to 
establish the role of wishing for speci- 
fied die faces (targets) to appear 
when the dice are cast. In Hilton, 
Baer, and Rhine (1943), for the con- 
trol test with medium sized dice (not 
included in Table 1), Ss were in- 
formed that this was to test the laws 
of chance; ‘‘this was done to prevent 
their attempting to influence the 
dice’ (p. 175). In this series of 128 
runs, the average fell to 5.03, giving 
an overall chance deviation of +4 
hits. However, if one compares this 
control series with those runs of the 
main series made with the medium 
sized dice, the difference is insignifi- 
cant, as Nicol has noted in the Ciba 
symposium (p. 34). 

The study of the ‘social factor’ 
compared results obtained by Wood- 
ruff working (and recording) alone, 
with the results obtained when he was 
distracted by Price (who acted as 
recorder) and those occurring in the 
normal (unheckled) situation with 
neutral observers present (Price & 
Rhine, 1944) obtained the preceding 
week. The last data, of course, fail to 
constitute an acceptable postexperi- 
mental control test. The ‘Minor 
Studies” involving pharmacological 
variables violate the simplest require- 
ments of control studies, lacking pre- 
and postcontrol tests and ‘‘placebo”’ 
tests (Averill & Rhine, 1945; Rhine 
et al., 1945). Equally inadequate are 
the data obtained with hypnosis; e.g., 
design modifications were introduced 
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during the course of this study, since 
the ‘‘experiments did not turn out as 
had been expected”’ (Rhine, 1946a, 
p. 130). 

The one test constituting a control 
on wishing is the Frick solo with 60 
dice/cup-throw in 1937. Working 
alone, he completed two series the 
first of which is reported in Table 1. 
As indicated there were +582 hits in 
the 2,172 runs for the six-face target. 
A second series was now done, in 
exactly the same way, again recording 
6's (to make for consistent recording), 
but wishing part of the time for 1's 
and the remainder for 6's not to ap- 
pear. Under these conditions, for an 
equal number of (2,172) runs, there 
were +576 hits for the six-face, al- 
though S was wishing for 1’s. The 
CR for the excess 6’s was of the same 
order of significance as when first 
wishing for 6’s in the first part 
(CR=6.77). These negative results 
evoked the interpretation (Rhine & 


Humphrey, 1945a) that there was 
‘“‘no place in Frick’s personal philos- 
ophy to accommodate the PK hy- 


pothesis” (p. 215); “it appears that 
Frick must have, as it were, pretty 
completely deceived himself in the 
conduct of Series B...he was not 
well unified in his motivational ele- 
ments’ (p. 218). Therefore, if there 
were as many hits in the (control) 
Series B as in the (experimental) 
Series A, thus proving the dice were 
biased, this ‘‘would solve any prob- 
lem raised by the first series and bring 
the investigator the maximum peace 
of mind” (p. 215). The positive re- 
sults of Frick’s first report (Rhine & 
Humphrey, 1944c), were not ques- 
tioned on motivational bases. Such 
post hoc rationalization is frequently 
to be noted, even in the Later Dice 
tests. Thus, the negative results ob- 
tained in Schwartz’s (Table 2) un- 
witnessed cup thrown experiment, 
planned in correspondence with the 
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Duke Parapsychology Laboratory, 
were suitably rationalized by Rhine 
(1946c, 1952) and other illustrations 
are to be found in the analyses of 
declines and in the long, largely solo, 
series reported by Forwald (Table 4). 


Probability Model 


Since, with the exception of the 
Frick-Solo Series, adequate control 
tests are lacking in this series of re- 
ports, the case for PK rests upon the 
obtained scores (target hits) which 
deviate markedly from the theoretical 
expectations predicted by the Theo- 
retical Model—in this case, the nor- 
mal probability curve. As noted 
earlier, the reported data of the 19 
relevant reports show an _ overall 
+6,515 hits. What does this mean? 
Although the published descriptions 
of the test conditions leaves much to 
be desired, a tabulation discloses the 
following distribution of die faces as 
targets: 

Total 
runs 


Number of 
studies 


4 
1 
il 
3 


Target 


Total 19 


Generally, scoring was recorded in 
terms of hits—how many dice turned 
up with the wished-for face. The 
common practice was to score only 
the hits, that is only those faces which 
were specified as targets. For this 
reason, a further analysis is not pos- 
sible in those studies in which the 
targets were for Highs, Lows, or 
Sevens. However, some pertinent 
information is available in those cases 
in which the target was varied; i.e., 
Faces 1 to 6. 

For these three studies, the follow- 
ing was reported. In Gibson, Gibson, 
& Rhine (1943), Faces 2, 3, and 4 
were chosen as targets in about 
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7-11% of the throws, Face 1 was 
chosen in 17% of the throws, and 
Faces 5 and 6 served most frequently 
as targets, respectively, 27% and 
29%. In the 1944 report by the same 
authors (Gibson Machine), each die 
face was chosen as a target with equal 
frequency both by Gibson and his 
wife. A miscellaneous series of 429 
runs is omitted from the present con- 
sideration since nothing is reported 
on target faces for this sample. In 
the First Frick Report, Faces 1-2-3 
were chosen 4.7% and Faces 4-5-6 
95.3% of the runs, with the 6 face 
alone being selected on 75% of all the 
trials (Rhine & Humphrey, 1944c). 

In terms of all 19 studies, Faces 1-3 
served as target for about 13% of all 
runs. In all, six-face and sixes were 
specified as target for about 65% of 
all runs. Of a number of interesting 
considerations, it is self-evident that 
the most elementary requirement 
necessitated the equal representation 


of all six die faces as targets in some 
randomized order and the tabulation 


of all die faces on all trials. There is 
no need to make use of higher mathe- 
matics to conclude that biased dice 
could account for the obtained results. 
It is also important to note in this 
connection that Weldon’s report of a 
(nonwishing) long series of dice throw- 
ing was available years before the 
Duke tests were started. Several 
references are to be found for this 
test. The most common is given by 
Fisher (1938, p. 67) who reported 
that Weldon made a total of 26,306 
throws of 12 dice, scoring the number 
of times 5’s and 6’s appeared. There 
was an excess of (+) 1,378 hits 
(106,602 with chance expectancy 
predicting 105,224 5’s and 6’s). This 
analysis is to be found in all editions 
of Fisher’s now classic book which 
was first published in 1925 and was 
acknowledged in the very first report 
by Rhine and Rhine (1943). 
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Some years later, Pratt (1947a) 
reported a nonwishing control series 
in which all die faces were recorded 
by an observer (cf. Table 2). Faces 
1-2 scored negative deviations from 
chance (combined = —348 hits), 
Faces 5 and 6 gave positive scores 
(combined = +239 hits), and the scor- 
ing pattern was 6, 5, 4, 3, 2, 1. Pratt 
(1947a) acknowledged that dice with 
excavated spots (used by Gibson, for 
example) favor the higher faces and 
admits that ‘‘many PK reports have 
presented results which fit this pat- 
tern’”’ (p. 55) emphasizing, however, 
that ‘‘the best answer to the biased- 
dice hypothesis came out of the posi- 
tion effect analysis’ (p. 56). And at 
this time Rhine (1947) acknowledged 
that “The problem of faulty dice 
remained .... We recognized that 
we would have to conduct tests in 
such a way that bias in the dice 
would be equalized or controlled in 
some reliable manner. We decided to 
seek perfect experiments rather than 
perfect dice” (p. 99). By 1943 (Rhine 
1947) ‘“‘We had, for instance, con- 
ducted tests in which an equal num- 
ber of throws were made for each 
face of the die’’ (p. 105). “But it was 
the decline evidence that settled 
every wavering doubt in our minds 
about PK” (Rhine, 1943, p. 105). 
From the published evidence, as well 
as personal conversations with a 
number of interested parties, it is 
clear that the question of dice-bias 
was of considerable concern and only 
the post hoc analysis of decline in 
scoring proved self-convincing. The 
serious defects in the experimental 
design and execution of these studies 
cannot be ignored and one reviewer 
(Soal, 1948) concluded that these 
“Duke experiments seem to have 
fallen into pitfalls that an intelligent 
school boy would have avoided”’ (p. 
185). None of these crucial weak- 
nesses in experimental design is recti- 
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fied by the postmortem report of a 
significant decline in hits in these 
already completed studies. By the 
most elementary standards, this con- 
stituted a new hypothesis subject to 
subsequent test. 


Negative Data 


The group of reports just con- 
sidered is characterized by the ab- 
sence of negative results. Frick’s Solo 
test is the important exception and 
Rhine’s post hoc rationalization has 
been noted (Rhine & Humphrey, 
1945a). No statement has been pub- 
lished concerning the total number of 
tests carried out or the proportion of 
results which were negative. The first 
independent negative findings are 
contained in the final two reports 
listed in Table 1. 

In what was one of the earliest at- 
tempts of wishing with dice, Nicol 
carried out a series of tests from 
1934-37 (Nicol & Carington, 1947). 
This study was the first in which all 
throws were recorded and all faces 
were used as targets in a systematic 
fashion. Results on each of four 
series of tests were insignificant. Of 
the total throws given in our tabula- 
tion (5,640 die throws unwitnessed), 
the most useful are the witnessed 
throws of Group 4. Here, each of 
eight Ss made 2,400 throws succes- 
sively for each of the die faces as 
target, in order from 1 to 6 (total 
= 14,000 die throws/S). On these 
115,200 throws, the total deviation of 
+91 was of no significance. The 
analysis of the data by Carington 
included a detailed examination for 
declines but “‘all results have been 
null” (p. 174). Of the three analyses 
made, significance is attached to one. 
However, Carington noted, “but 
if this work stood alone [without 
Rhine’s reports], it would not be 
sufficient to warrant the acceptance”’ 
of the PK hypothesis (p. 175). 
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The final report in Table 1 is the 
Minor Study by Nash (1944). It was 
carried out in 1940 with 113 Univer- 
sity of Arizona students, in ignorance 
of the work that was being prepared 
for publication at Duke University. 
All six die faces were used as targets 
with equal opportunity for all faces to 
serve as target. For all Ss combined, 
there was a grand total of 8,136 
throws with an insignificant devia- 
tion of +39 hits. Nash and his wife 
then carried out a control series with 
6 dice/throw for a total of 18,144 die 
throws. The scores, here, for each of 
Faces 1-3 were negative (com- 
bined = —92), whereas the deviations 
for each of the target Faces 4-6 were 
all positive (combined = +92 hits) 
giving a remarkable overall deviation 
of zero for the entire control series. 

Some overall generalizations are at 
once apparent with respect to these 
basic data. If the test of precognition 
(Woodruff & Rhine, 1942) and 
McDougall’s negative data (Rhine & 
Humphrey, 1943) are ignored, the 
positive data in Table 1 offer very 
impressive scores. But many of these 
high scores were obtained under pro- 
cedures and conditions which were, at 
best, informal. Only nine reports 
were carried out on the premises of 
the Duke Parapsychology Labora- 
tory. Of these, five are entitled 
Minor articles, the rationale for 
which classification is not given. The 
remaining 11 reports (including 
McDougall’s One Die) were carried 
out in dormitories and homes by 
students, professional and business 
people, and other interested ama- 
teurs. Four of these latter reports 
were based on data obtained by indi- 
viduals who worked alone, unsuper- 
vised, acting simultaneously as E, S, 
and recorder. Thus more than 50% 
of these data were obtained by ama- 
teurs lacking direct professional su- 
pervision. 
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Note.—(a) Used six dice, wishing only for three dice; (6) wishing for specified colored sides of cubes; (c) wishing for H (ighs) 


and L(ows) on 1's for different dice on each throw. 
® Duke Report. 
b Minor Report. 
® BE, as S, working alone. 


Later Dice Tests 


The tests in this section were car- 
ried out after the development of the 
decline (in scoring) hypothesis. There 
was more concern given to experi- 
mental design, to avoid recording er- 
rors and to insure tests with all faces 
(or Highs and Lows) as targets in the 
same study. In some tests, two ob- 
servers made independent records of 
the scores and all faces thrown were 
recorded. Unwitnessed solo efforts 
were few. The reports by Schwartz, 
Mangan, Humphrey, and Thouless 
complete this category. In a few 
studies unsupervised students served 
as Es. Also included in this group is a 
series of tests of one individual, 
Blundin, discussed under Sensitives. 
A tabular summary of the design 
details, as well as the exact references 
are contained in Table 2. 

Of the 30 studies listed in Table 2, 


9 originated in the Duke Parapsy- 
chology Laboratory. The first 2 of 
these Duke reports were carried out 
by unsupervised students in dormi- 
tories (Gatling & Rhine, 1946; Herter 
& Rhine, 1945). Two others were 
listed as Minor reports, the first part 
of Mangan’s (1954) report was a solo 
effort and Pratt’s (1947a) test was a 
nonwishing control series. The re- 
mainder were the carefully executed 
studies, negative in character, by 
Pratt and Woodruff (1946) and 
Van de Castle’s (1958) combined 
ESP-PK test, Humphrey’s interest- 
ing but inadequate Help-Hinder test 
(Humphrey, 1947a) and her High- 
Low solo effort (Humphrey, 1947b). 

The period is ushered in primarily 
by a number of independent attempts 
to confirm the early Duke results. 
The efforts of Nash (1944) and Nicol 
and Carrington (1947) have already 
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been considered with Table 1. In line 
with these findings are the negative 
results later obtained by Hyde (1945) 
and Parsons (1945). In addition to 
an insignificant deviation from 
chance, Hyde (1945) found no evi- 
dence for declines and concluded that 
“Positive PK results are not guar- 
anteed by repetition of the published 
American [Duke] technique’”’ (p. 296). 
In a larger series, with more Ss and 
more than one technique, Parsons’ 
(1945) data were also negative in 
terms of deviation and hit distribu- 
tion (i.e., absence of declines) with 
the record sheet patterned on the 
Duke record form. Dice bias to 
higher faces was detected. Also to be 
noted from the London SPR group 
(not included in Table 2) was Scott's 
(1947) report of negative results with 
the Cambridge group and West's 
(1954a) report of negative results in 
other SPR unpublished data. 

In addition there was the series of 


studies carried out by Rose and his 
wife. In the first study (Rose, 1950), 
E and his wife served as main Ss with 
supporting data obtained from 21 
friends. The dice were released down 
an inclined plane when observer with- 
drew the ruler upon which the dice 


had rested. All trials were witnessed 
but Ss ‘“‘were allowed for the greater 
part of the series to select their target 
faces and consequently there were not 
equal numbers of runs for each face 
of the die’ (p. 116). With arbitrary 
stopping in addition, ‘‘to change to 
more rigid and well-balanced pro- 
cedures” (p. 117), the ‘‘loose design 
of the experiment” (p. 125) requires 
no further comment. 

Although the pooled deviation was 
insignificant (p=.16), the wife’s re- 
sult was considered significant, with 
an overall score of +83 hits on a total 
of 331} runs. Her score of +85 hits 
in one section of 125 runs when the 
six-face was the target should be 
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noted. The lack of the several face 
scores prevents pinpointing the source 
of E’s +41 hits on his 3932 runs 
(who preferred the five-face as tar- 
get), whereas the other two main Ss 
(total 256} runs) and the remaining 
miscellaneous friends (total 375 runs) 
all scored overall minus deviations. 
A control series of 400 runs by a 
friend was reported in which there 
was a minus deviation on each of 
Faces 1-4 with +25 hits on Face 5 
and +28 hits on Face 6. Dice bias 
was clearly confounding the experi- 
ment, a possibility which should 
have been obvious to the authors 
from the already published studies. 

In a combined ESP-PK_ study 
(Rose & Rose, 1951), colored cubes 
(without indentations) “accurately 
shaped by an engineer” (p. 129) were 
used in the latter phase with 20 
Australian aborigines. Although 
there was optional stopping, each S 
had a minimum of 24 runs, with an 
equal number of throws for the sev- 
eral faces. Rose supervised and 
called the results which were checked 
by his wife who recorded only hits. 
With marginal success for one S 
(+108 hits on 600 runs), scores for 
given face colors were not reported. 
The CR of 1.61 for the pooled data 
was insignificant. 

A further study (Rose, 1952) was 
now carried out with Duke dice, using 
25 Ss, and using all faces as targets, 
again in a combined ESP-PK tests. 
Three sets of data were obtained with 
the dice in different locales, the re- 
sults of which were all insignificant: 
+128 hits on 1896 runs, —27 hits on 
168 runs, and +7 hits on 1128 
runs. The pooled data, presumably 
for all 25 Ss combined, were equally 
insignificant. 

In a follow-up (Rose, 1955), again 
as part of acombined ESP-PK study, 
aboriginal Ss threw from a _ hand 
shaker, in which competition was 
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encouraged, with only hits being re- 
corded. Deviations for all Ss, includ- 
ing the best previous S (Rose, 1952), 
were insignificant, the pooled data 
(312 runs) totaling +6 hits 
(CR=.12). 

A series of night tests was reported 
by McConnell (1955), in which the 
targets were wished for by Ss before 
going to sleep. Scores for the ma- 
chine-rotated pair of dice were photo- 
graphically recorded. Several target 
arrangements were used, e.g., doubles, 
as well as a sequence involving all 
faces. McConnell’s score was re- 
ported as significant (+66.5 hits) 
but the overall tabulation of +73.5 
hits for 18,092 die throws of all Ss 
combined was not significant (p = .14). 
The most satisfying experimental 
component consisted of tests in which 
each S selected a different given 


target face, 1 to 6, for each succeeding 
night. The score for the pooled data 


for this organized section was — 26.3 
hits for the 8,534 die throws 
(CR=.76). 

Overlapping with the night tests, 
was the day series carried out in 
1948-50 with photographic recording 
of 393 wishing Ss. The results by the 
criterion of target hits was entirely 
negative (McConnell et al., 1955). 
This study with the Duke dice ma- 
chine was repeated by Dale and 
Woodruff in 1951-52 in a combined 
PK-ESP test with 108 Ss and 62,208 
dice readings and the results were 
insignificant with respect both to hits 
and declines (Murphy, 1952b). On 
the PK test, the overall deviation was 
—13 hits, “fantastically close to 
chance deviation’’ with zero devia- 
tion on the first half of 31,104 die 
readings.‘ 

Finally, there is the extremely long 
series of throws (not in Table 2) re- 


‘Laura Dale, 


April 1959. 


personal communication, 
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ported by Scarne, an avowed and 
scornful skeptic. His criticisms of the 
use of ordinary commercial dice, and 
the care with which his dice were 
measured for accuracy before ac- 
ceptance are extremely pertinent 
(Scarne, 1956). Precision measuring 
instruments were used to select true 
dice and each pair was discarded after 
every 3,600 throws. Scarne, wishing 
for 7’s, scored for Lows (2, 3, 4, 5, and 
6), Highs (8, 9, 10, 11, and 12), and 
7's. Beginning in 1940, he continued 
as Opportunity permitted for about 
15 years, stopping after 6,000,000 
rolls. By chance there should have 
been 1,000,000 7’s and 250,000,000 
hits for Lows and Highs, respectively. 
The final tabulation reported was 
2,499,998 Lows and 2,500,001 Highs, 
the remainder turning up as 7’s. The 
significance of the difference between 
these results and those obtained by 
Weldon (Fisher, 1938) whose dice 
were obviously high-score biased, is 
readily apparent. 

There are to be noted in this period 
more attempts to test for the role of a 
psychological factor in dice throwing. 
Two such series are those of Dale 
(1946) and Nash (1946) which have 
been interpreted as evidence for PK. 

The four studies at the ASPR will 
be considered as a unit (Dale, 1946; 
Dale & Woodruff, 1947). The initial 
study, with positive findings, was 
carried out in 1946, and the other 
three, with negative results, were 
completed in 1947. In the first study 
(Dale, 1946) each of 54 Ss (29 female, 
25 male) had one session in which 
dice shaken in a cup were cast downa 
chute, landing on a platform. The 
targets, respectively, were in the fol- 
lowing progressive sequences: Faces 
1-6, 2-1, 3-2, etc. The Ss were 
divided into six equal groups, with 
one of the target sequences assigned 
to each of the subgroups. There was 
an equal number of trials for each 
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target face. Records were kept of 
every die face cast. Both S and E 
kept records and any discrepancy, 
during comparisons after each run, 
was resolved by taking the lower 
score. A dozen sessions or so were 
witnessed in whole or in part by ob- 
servers. . 

The Ss comprised two groups, 
those who accepted the possibility of 
the influence of mind over matter 
(N=41) and those who did not 
(N=13). The difference between the 
Believers with a mean of 4.117 (+116 
hits) and the Nonbelievers with a 
mean of 4.176 (+55 hits) was sta- 
tistically insignificant. Declines in 
scoring were noted from run to run on 
the data page. There was an incline 
(increase in scoring) from page to 
page, resulting in twice as many hits 
on the last three pages as on the first 
three pages. It was noted by Dale 


that in Nash’s (1946) study ‘‘as in 


our experiment, both groups scored 
well above chance”’ (p. 144) and that 
the “‘attitude expressed by the S 
toward the possibility of PK was a 
variable of no importance’ (Dale, 
1946, p. 132). 

In the second and third studies 
(Dale & Woodruff, 1947) both Es 
made independent records. In the 
second study (24Ss) there was also an 
electrically operated randomizing 
chute and photographic recordings of 
die faces. In the third test (54 Ss), 
the electrical gadgets were elimi- 
nated. In the fourth, and final study, 
Dale worked alone to replicate com- 
pletely the conditions of her first 
study. The results of Experiments II, 
III, and IV were entirely negative. 
Analysis of the throws in these three 
tests showed that Faces 5 and 6 al- 
ways gave positive deviations and 
Faces 1-2 always scored negative 
deviations. The striking declines in 
Experiment I, were entirely absent in 
II and III, with an incline detected 
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in IV which was opposite to the effect 
obtained in I. There was no increase 
in number of hits in last three data 
pages of Experiment IV, as noted in 
Experiment I. There was no support 
for the hypothesis derived from Ex- 
periment I that females score better 
than males. The most likely inter- 
pretation, therefore, of the positive 
results in Experiment I is the same as 
that concluded for the three subse- 
quent negative studies: that “no 
clear evidence for the operation of 
psychokinesis was found’”’ (Dale & 
Woodruff, 1947, p. 79). 

The initial study by Nash (1944) 
was followed by several subsequent 
tests which were essentially psycho- 
logical in design. Not as well con- 
trolled as some others, such as Dale 
(1946, 1947) and Van de Castle 
(1958), with the use of undergraduate 
students as Es, less confidence is to be 
placed in the reliability of these data. 
A test of Believers versus Nonbe- 
lievers (Nash, 1946), was carried out 
by an undergraduate student who 
counted and recorded the hits. The 
scoring was confirmed by S. About 
one-half of the rolls were made at a 
distance of 3 feet from the release 
box, the remainder from a distance of 
30 feet. The number of runs rolled by 
each S varied from 48 to 176, and the 
only data reported were for the 
group as a whole. Because of Ss’ 
target preferences, there were 112 
runs for Faces 2 and 5; 128 for Faces 
1, 4, and 6; and 144 for Face 3. For 
the group as a whole, the pooled data 
showed positive deviations for all 
faces. A typical QD (greatest differ- 
ence between first and fourth quarter, 
diagonal decline) was reported but 
the difference was not statistically 
significant. Mean for Believers was 
4.43 and that for the Nonbelievers 
was 4.34 (chance=4.00/run) with 
the difference reported as insignifi- 
cant. No significant differences oc- 
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curred between near and far dis- 
tances. This study (Nash, 1946) was 
quoted by Rhine as evidence for Ss 
doing as well at far distances as com- 
pared to near distances (Rhine, 1953), 
but the negative results on distance 
reported the following year (Nash & 
Richards, 1947) were ignored. This 
latter follow-up was carried out by 
Richards, as an undergraduate stu- 
dent in zoology. Each S had two 
tests, a month apart, for a total of 32 
runs, with split halves of Ss arranged 
for R(eward) and No R(eward) at 3- 
feet and 30-feet distances from ap- 
paratus. Complete randomization of 
targets was not obtained. 

There was a total of 256 runs for 
each of the six target faces and the 
overall score of +158 hits (p=.014) 
was considered suggestive. The 


scores on the several target faces 
were: Faces 2 and 3 combined = +43 
hits, Face 1 was —23, and Faces 5 


and 6 combined was +136, with zero 
deviation for Face 4. But Nash and 
Richards (1947) stated that the ‘“‘hy- 
pothesis of dice bias cannot account 
for the results’ (p. 274). On the 
distance test, there were +39 hits at 
3-foot distance and +!19 hits at the 
longer distance, but the authors con- 
clude that one cannot say that this 
“gave a conclusive difference” (p. 
279). Decline in scoring was not 
typical; instead, there was general 
tendency to inclines. 

In what was essentially a Wish and 
No Wish test, data were collected in 
1948 by Bray, an undergraduate 
student assistant (Nash, 1956). Tar- 
gets were selected by card-drawing 
from a shuffled deck (numbered 1 to 
6). There were six dice/throw, three 
red and three white on every release. 
In Series 1 and 2, S selected which set 
(color) was to come up with the given 
target face. The pooled data for all 
three series were given in terms of 
the score for the (three) wished-for 
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dice, the deviation being insignificant. 
Inadvertently or not, there was an 
obvious built-in control on wishing. 
If one compares Series 1 and 2, one 
has a direct test of the wished-for and 
ignored sets of three dice. The scores 
for the two sets were, respectively, 
—25 hits and —41 hits. Presumably 
all six faces served as target, although 
target order and face distributions 
were not reported and the obvious un- 
importance of wishing was ignored. 

Also to be noted under the psycho- 
logical rubric were two Duke studies 
by Humphrey, with an interesting 
design feature but unfortunately 
marred by serious weaknesses. In the 
Help-Hinder series (Humphrey, 
1947a), with “three different Ss par- 
ticipating”’ (p. 4), a thrower wishing 
for self-chosen target was helped or 
hindered by the wishing of an ob- 
server, as decided by the latter. In 
the first situation, the latter wished 
for S’s target: in the second, the ob- 
server wished for some nontarget 
face. S was presumed ignorant of the 
observer's attitude (help or hinder). 
But since ~ required more time to 
record scores both for S and observer 
when latter was hinder-wishing (i.e., 
two entries), S must have had some 
inkling of observer’s attitude. Only 
hits were recorded by E, so that face 
distribution for all throws is not 
known. The number of runs for 
given targets, as chosen by S, was not 
equal, nor was the experiment of a 
predetermined length. Dice bias was 
considered irrelevant since positive 
deviations occurred on all six faces 
serving as targets. 

On the equivalent of 177 runs, ob- 
server wishing for S’s target, the 
score was +95 hits. On 213 hinder- 
runs S scored +32 hits, whereas 
observer scored +12 hits on her self- 
chosen (different) target on the same 
throws. Combining the scores on the 
hinder trials, as Humphrey submits, 
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the score becomes +12 hits on 426 
runs which when combined with the 
help runs constitute the totals given 
in Table 2 (+139 hits on 603 runs). 

The actual number of runs was 390, 
but with each run scored twice (.S and 
wisher) the reported total was 603, 
with plus deviations on all die faces 
as target for a total of +139 hits 
(Table 2). It is unfortunate that the 
experimental design was not more 
rigorous, with equal representation 
for all faces as target in some ran- 
domized order and two independent 
recorders ignorant of target designa- 
tions. The possibility of recording 
errors in this situation constituted a 
serious weakness. Equally critical 
was the failure to incorporate some 
balanced order of wish (help or 


hinder) with nonwish sequences. 

In a simultaneous High-Low test 
Humphrey (1947b), working alone as 
S, E, and recorder, used six red and 
six white dice on every throw. Always 


scoring for 1's, S for one-half of the 
runs wished the red to be high and the 
whites to be low; and on the remain- 
ing runs wished for the white to be 
high and the red to be low—in all 
cases with respect to 1's. Overall 
score on high wish was +45 hits 
(insignificant) and for low wishes 
—179, the latter reported as signifi- 
cant. An unavoidable change in dice 
occurred during the study. Unfor- 
tunately all experimental functions 
were performed by one person. Aside 
from the problems of trying to keep 
different mental sets for the two colors 
of dice, the tendency to read the dice 
incorrectly, absence of independent 
(naive) recorder (who did not know 
the mental sets of S) were irreparable 
flaws. 

In this connection, the carefully 
controlled study on personality and 
PK reported by Van de Castle (1958) 
should be noted. Carried out by 
White (who has reported a series of 
successful ESP tests) a variety of 
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personality tests were given in con- 
junction with ESP cards and draw- 
ings, as well as a PK dice test. No 
relationship was reported between 
performance and any of the tests, 
including sheep-goat dichotomy 
(Schmeidler), expansive-compressive 
ratings (Humphrey), Rosenzweig 
picture-frustration, Rorschach, or IQ 
scores. All Ss had equal opportunity 
for all die faces as target in a prede- 
termined order not known to E (S 
having been instructed not to reveal 
target until the test was completed). 
The total deviation of +32 hits for 
the 31 Ss was insignificant, with an 
overall incline of +26 hits from first 
to fourth quarter. ‘Thus the usual 
analysis of the data from this experi- 
ment offer no support for an inter- 
pretation of PK” Van de Castle, 
1958, (p. 136). 

There remain to be considered a 
few reports, of slight consequence, 
which can be briefly summarized. 
With a fellow undergraduate as S, 
Herter served as recorder. With 
equal representation of all faces as 
target, S was soon free to begin with 
any face and choose any order of 
targets. Positive deviations were 
recorded for all target faces. The ex- 
periment was halted when S left 
school. ‘It was the first and only ex- 
perience with PK testing for both 
C. J. H. and his S”’ (Herter & Rhine, 
1945, p. 24). Rhine commented that 
there is no guarantee that the results 
could be duplicated by them “for 
their motivation would not be the 
same” (p. 24). 

Gatling, another unsupervised stu- 
dent, acted as £ in some sessions and 
as S in others (Gatling & Rhine, 
1946). In this study, four amateur 
gamblers were matched against four 
ministerial students one of whom was 
Gatling. At the beginning, Ss were 
free to pick from three sets of dice and 
to select the target at the beginning of 
each column of the score sheet. Later 
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in the experiment, target choice was 
curbed since it was intended to equal- 
ize the number of throws for each 
face as target. There was large varia- 
tion in the number of runs contributed 
by each S. The combined total for 
the four Ss, ‘regarded as successful 
at the crap game” (p. 120) was 540 
runs compared to the combined total 
of 702 runs for the four ministerial Ss. 
All Ss combined, there was a plus 
deviation on each die face as target. 
It was reported that the decline [QD 
of the page] was not typical. This 
experiment covered a 2-week period 
in June ‘‘and was terminated by 
W. G.’s departure from the univer- 
sity’’ (p. 12). 

The penultimate study by Gibson 
(1947) consisted of an “informal 
interim report,’’ with apparatus simi- 
lar to that of Dale (1946). Half the 
data were obtained in one long session 
and the remainder were collected in 
two successive monthly tests. The 
statistical significance attached to the 
pooled data (Table 2) is due to the 
score on the second part (+52 hits on 
72 runs) of the first session—in which 
the scores were checked by S and the 
wives of S and E. This session was 
started with S and E having a beer, S 
feeling relaxed as the test began. In 
the final study of Gibson (1948) with 
independent recording by S and E 
and an orderly arrangement of target 
sequence there was an overall insig- 
nificant deviation from chance and an 
absence of typical decline. 

The unwitnessed husband-wife 
team of Vasse and Vasse (1951) re- 
ported significant results only for the 
wife who “had a marked effect” in a 
previous attempt to influence the 
growth of seedlings (p. 264). Her 
scores of +69 on 135 runs for High 
and +45 for 111 runs for Low targets 
account for the significance of their 
pooled data in Table 2. 

In an exploratory ESP-PK study, 
Osis (1953), wife, and friend served as 
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Ss in a series of single die throws for 
hidden targets. The friend, working 
alone and recording her own data, 
completed her final trials on a vaca- 
tion, after a lapse of some time. With 
all data pooled, / is given as .0004. 
There was inequality in targets, the 
number of trials was not the same for 
all Ss, nor was the same die used 
throughout. Because of the explora- 
tory nature of this effort, no attempt 
was made to use “‘full precautions of 
standard test conditions’ (p. 301). 

In this category belongs Mangan’s 
(1954) first published report working 
alone for the first half and with an 
observer who recorded during the 
second half of the test. Following a 
preliminary test, there were negative 
deviations for Faces 1-3 and positive 
deviations occurred with Faces 4-6. 
For the main experiment, ‘“‘it was 
predicted that the scores on the high 
dice would be higher than those on 
low dice’ (p. 210). Overall, High 
Face score=+242 and Low Face 
score= +45. Typical decline (QD) 
was absent. 

In Knowles (1949), a statistically 
significant result was obtained by 
pooling data from a preliminary test 
(not in Table 2) with the main series. 
These combined data were statisti- 
cally indistinguishable from a subse- 
quent control series with a single 
(actually) loaded die weighted on 
Faces 4-6. 

In the test by Pratt and Woodruff 
(1946), a copy of the experimental 
design was deposited with the librar- 
ian before the tests were started. 
With equal representation of all the 
die faces as target the dice were re- 
leased from rotating tubes against a 
barricade. The results were not differ- 
ent from chance. The scores for Faces 
1-2 totaled a minus deviation with 
the largest positive scores obtained 
on the higher target faces. This care- 
fully executed study was labeled 
Minor. 
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Belonging in this category is Thou- 
less’ (1951) unwitnessed solo and 
casually executed effort, ‘‘fitted into 
odd times when I happened to have a 
spare twenty minutes’ (p. 108), 
which deserves mention even if only 
marginal significance is attributed to 
the data pooled from different design 
conditions. It is one of the two re- 
ports on record incorporating, at 
least in part, a latin square design for 
target designations (also cf. Thouless, 
1945a, 1945b). 


WISHING WITH OBJECTS OTHER 
Tuan DICE 


In Table 3 are summarized the few 
reported tests making use of objects 
other than dice, primarily discs and 
coins. 

On the first of the three studies 
McMahan (1945) making use of discs 
(a) tested college students individu- 
ally and then (6) a group of children 
at adolescent PK parties in the home 


of one of the Ss with prizes awarded. 
In a, the main test involved plastic 
discs which were released through a 
system of baffles onto a table. Ss 
wished for the objects to come to rest 


with a designated face up. In 8, the 
group helped in concentrating. The 
scores in both parts of the study were 
statistically insignificant. The second 
study was essentially like the group 
party-situation already described 
(McMahan, 1946). Again, the over- 
all deviation was __ insignificant 
(CR=.67), but significance was 
attributed to difference between first 
and fourth quarters of trials. 

In the final study (McMahan, 
1947) more or less the same experi- 
mental situation obtained, with equal 
number of trials in a normally lighted 
room and in a dark situation in which 
scores were recorded by flashlight. 
Overall deviation was again insig- 
nificant, but the light-room score was 
— 61 hits, and dark situation was +54 
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hits with a reported CR of 2.45. ‘Ss 
on the whole preferred the dark’”’ (p. 
51), and there were reported striking 
declines in scoring rates for first and 
second parts. Gibson has reported 
opposite results in a dark test. With 
a social situation involving children 
and adolescents and flashlight scoring 
in darkness, it would appear to be 
mandatory to have two independent 
recorders, not bothered with handling 
the experimental details. 

Solo and unwitnessed, Thouless 
(1945a, 1945b), did his coin tossing 
from a ruler, on which coins were put 
in random order. Surprisingly enough, 
although the mathematicians and 
statisticians have traditionally em- 
ployed coin tossing in relation to the 
binomial theorem, it has been little 
used in the present connection. Ten 
coins were arranged half heads, half 
tails on a ruler (in mixed order) and 
Thouless did not look at them when 
tossing. Overall, there were, respec- 
tively, —16 heads and +58 tails for 
each side as target. Thouless (1945b) 
remarked that “‘It is obvious that the 
result is not of any value as independ- 
ent evidence for PK”’ (p. 169). Work- 
ing alone, Bailey (Pope, 1946) glass- 
tossed a penny onto a rug for 100 
trials per session, and also made a 
series of single-die throws. The 
journal (Pope, 1946), commenting 
that these attempts “offer suggestive 
data on the comparative success of 
dice and disks in PK experiments” 
(p. 213), presents a combined score 
for the two procedures with a p of 
.012. The report by Binski (1957) was 
the thesis for which the PhD was 
awarded. The results with coin tos- 
sing and roulette wheel guesses for 
red or black were insignificant. An 
additional series with one S (not in 
our tabulation) which had no pre- 
arranged experimental plan and in- 
volved optional stopping was con- 
sidered highly significant; and some 
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TABLE 3 
WISHING WITH OsyecTs OTHER THAN DICE 








Author | References 


Objects 





Mc Mahan® (1945) discs 
McMahan* 
McMahan®* 
Thouless 


Pope 


(1946) 
(1947) 
(1945b) 
(1946) 


discs 
discs 
coins — ° 
coin Bailey* 
die 
coins 
roulette 


Binski (1957) 














Number 
of 
items 


Items 
throw 


Ss Throw | Target 





46 side 20 
27 side 10 
21 side 10 
22 side 10 
H,T 10 
H,T 1 
1-6 1 
- he y 100 
Red \ 1 
Black 


/ 


46,000 
13,500 
12,600 
8,800 
4,000 
1,000 
216 
153 ,000 
26,200 




















* Duke Report. 
> Minor Report. 
* E, as S, working alone. 


new unpublished tests were “en- 
couraging but not statistically sig- 
nificant’”’ (p. 290). 


PLACEMENT-WISHING 


In this approach, released objects 


were wished to land in designated 
areas. The reports, beginning in 
1951, are listed in Table 4. 

The first placement test, an innova- 
tion in PK research, was done by a 
business man, Cox (1951). It con- 
sisted of a combined test of hits for 
given target faces and placement 
scores in 252 marked-off squares, 
numbered from 1 to 6 in such fashion 
that no two adjacent areas were given 
the same number. There were several 
series of varying conditions, only the 
third of which Cox considered experi- 
mentally adequate. The two objec- 
tives—hits and area—were alternated 
as primary and secondary targets and 
S was instructed to concentrate on a 
given primary target and ignore the 
other. £ always called aloud the 
score for the primary area, counting 
the score for the secondary objective 
silently. But scoring was always done 
first for hits on die faces whether it 
was the primary or secondary target. 
Target designation and scoring was 


done by E who progressed from one 
to six die faces as targets. 

Combining the scores on dice hits 
and area placement into primary and 
secondary targets, the 1,632 standard 
runs (number of item throws+by 24) 
yielded scores of +100 for the pri- 
mary (CR of 1.36) and —426 for the 
secondary targets (CR of 5.78). In 
Series III, the only positive deviation 
was +18 hits when the area was the 
primary target. There was reported 
a significant score of —139 hits when 
the respective objectives (hits or 
placement) were secondary targets 
(576 runs in each case). 

Considering the difficult recording 
task, and the absence of independent 
checking, one need not be too con- 
cerned with the large negative devia- 
tions noted when the given target 
area or dice faces was secondary tar- 
get. Details are lacking on the distri- 
bution of scores for hits or area place- 
ment and only the pooled mean scores 
for primary and secondary targets 
were submitted. EZ recognized that 
the experiment (Cox, 1951) was 
equivalent to ‘‘unwitnessed observa- 
tions and recording”’ (p. 43). 

The second report by Cox (1954) 
belongs with the present group in 
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terms of general technique. Over a 
period of 2 years, 24 dice and an equal 
number of marbles were released on 
every trial for placement test alone, 
into an area divided in half by a fine 
wire, much like the Cormack ap- 
paratus (cf. Pratt, 1951). With vary- 
ing numbers of Ss in three series of 
tests, carried out at different times 
over a period of 2 years, S attempted 
to wish for hits in a specified section 
area. Unlike the first study, target 
and nontarget areas were alternated 
on successive trials. The reported 
pooled data are given in Table 4 

To test the difference between the 
dice and marbles, Ss concentrated on 
both types of objects in the first 
series, on dice or marbles in the 
second series, and one or the other as 
preferred in the final series. Primary 
and secondary targets were scored in 
the last two series but, unlike the first 
report, the difference was insignifi- 
cant. Throughout, the score with 
marbles was positive (+102 area 
hits) and that with dice (consistent 
with first report) always was negative 
(—162 area hits). All these data 
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were pooled, giving a CR of 2.93 for 
significant differences between dice 
and marble (p=.003). The experi- 
mental inadequacies of the initial 
effort were equally prevalent in this 
follow-up by Cox. And no subse- 
quent attempt has been made to 
replicate these findings under care- 
fully controlled conditions. Instead, 
a report of “exploratory nature’”’ is 
made 5 years later on a new tech- 
nique, i.e., a three dimensional place- 
ment test (cf. Cox, 1959). 

Following a suggestion from Pratt, 
Cormack, a_ retired businessman 
working solo, carried out a number of 
placement tests (Pratt, 1951). With 
variations in size and weight and 
number of objects, releases were 
made manually or electrically down 
an inclined plane into two horizon- 
tally divided areas. S willed for dice 
to end in one or the other section. 
The significance of each of the nine 
series varied, but by pooling the 
scores a significance of better than 
pb = .000001 is obtained. 

L. E. Rhine (1951) also reported a 
large scale test of placement-wishing. 


TABLE 4 


PLACEMENT WISHING 





Refer- 


Author ences 


Objects | 


Cox 


Cox 
Pratt 
Rhine 
Rhine, L. 
Forwald 
Forwald 
Forwald 
Forwald 
Forwald 
Forwald 
Forwald 
Pratt & Forwald* 
Knowles 

Wilbur & Mangan 
Wilbur & Mangan 
Steen 


dice 


(1951) 


(1954) 
(1951)> 
(1951) 
(1951) 
(1952b) 
(1952a) 
(1954b) 
(1954a) 
(1955a) 
(1955b) 
(1957) 
(1958) 
(1952) 
(1956) 
(1957) 
(1957) 


dice, marbles } 
dice Cormack® | 
dice | Forwald® 
various | 
cubes 
cubes 
cubes 
cubes 
cubes 
cubes 
cubes 
cubes 
inter | 
alls, marbles 
alls 
dice 


ES 
Forwald® 
Forwald® 
Forwald 

Forwald® 
Forwald 

Forwald® 
Forwald® 








| 32 


? 


2 


~~) 


ee 





Number of | 
items 


39, 168 


Devia- 
tion 


Items 


+ |r| Target thaow 


faces “m4 
area 
area 
area 
area 
area 
area 
area 
area 
area 1 
area 

area 

area 

area 

sector 
area 1 
area 1 
faces | 


oO 


48 
varied 
10 


32,256 
27,648 
40,000 
113/100 
27.000 
4,500 
10,000 


10 
| 6 

6 (b) 
1 


,6 








OB STESBSSSSSSR885 

















Note.—Scoring _ Ay fp in terms of number of objects coming to rest in specified target area. In Knowles’ test, rotating 
pointer was to sto iven circle-sector. Steen adopted baseball-rules and -scoring to dice throwing. (a) Number of Ss 
used was not speci 4% (6) on each throw, three dice not wished for served as control; (c) mean distances calculated, respec- 
tively, for See in target and nontarget areas. 

® Duke Report. 

b Minor Report. 

© E, as S working alone. 
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In the latter study, a nonspecified 
number of Ss, including E, were 
tested with objects released by lever 
down an inclined plane, ‘‘willed to 
fall to left or right’’ on table below. 
Marbles, coins, and cubes were used. 
In this study, there was an attempt 
to check the recording-error prob- 
lem; E read the score aloud and S 
checked the recording; any object on 
the dividing line was thrown again. 
With a miscellaneous number of Ss, 
for 46,800 released objects, there was 
a total of —28 hits. With a number 
of other Ss, a total of 66,300 releases 
yielded +176 hits. For the pooled 
data the CR of .88 was insignificant. 

With encouragement from Rhine, 
the following test was carried out by 
Wilbur and Mangan (1956). Glass 
marbles and steel balls were electri- 
cally released down a runway onto a 


surface with six slots. Right and left 
sides were alternately designated as 


targets. Checker called out scores 
which were entered by a recorder, 
both then checking the entries. The 
score with marbles in the first series 
resulted in +31 hits, the steel balls 
score totaling —6 hits, and the overall 
deviation was insignificant. In a 
second and final study, a test was 
made with three degrees of roughness 
of the incline plane and glass balls 
alone. The overall deviation was 
insignificant. 

A preliminary report by Knowles 
(1952) with £ and her brother as Ss, 
was an attempt to wish a rotating 
pointer, spun by hand, to stop at 
given target segments, with a paper 
scale divided into 15 such areas. 
Scores were obtained in terms of hits 
and amount of deviation from given 
target areas. Both Ss had an equal 
number of trials, with p=.0008 for 
their pooled scores. £ acted as re- 
corder for herself and her brother, 
except that her husband recorded for 
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her third session. Working at home, 
Steen (1957) a California business- 
man reported an “exploratory” ap- 
proach with one S in which dice 
throws were scored in terms of base- 
ball rules applied to a PK test. This 
was considered by Thouless (1958) as 
“distinctly encouraging’ and ‘‘For 
those not interested in baseball, the 
method could, no doubt, be adopted 
to the rules of cricket’’ (p. 205). 
SENSITIVES 

In starting the PK program, Rhine 
made no attempt to search for in- 
dividuals who would be particularly 
“‘sifted.’”’ In the initial phase of the 
dice tests, Ss for the most part, there- 
fore, were unselected. Almost 20 
years later, Rhine (1953) commented 
that ‘‘no special person had to be 
sought out as mediums; in fact, my 
wife and I began in 1934 by testing 
ourselves, our family members, our 
friends, students, and even casual 
visitors’ (p. 140). However, some 
PK tests were carried out with Ss 
presumed to be sensitive because of 
earlier successes in ESP tests. Gib- 
son’s S was previously successful in 
card guessing (Gibson, 1947). Shakel- 
ton, one of Soal’s famous Ss (Soal & 
Bateman, 1954), was given a PK test 
by Parsons (1945) but the results 
were negative. 

Belonging in this category too are 
the dice tests with Blundin which are 
considered below in detail. Forwald’s 
largely solo efforts in placement wish- 
ing are also included here since con- 
tinuing success over a period of years 
would be presumptive evidence of a 
sensitive. 

Blundin 

The series with Blundin was car- 
ried out in England, at first, included 
among a group of 10 Ss, largely un- 
witnessed and doing their own re- 








374 


cording on a variable number of trials 
for targets known only to £. In these 
results, obtained by Mitchell and 
Fisk (1953), as given in Table 2, 
“targets were not exactly equalized” 
(p. 49) and “the experiment may be 
recognized as a pilot study in which 
the possible effect of dice bias was 
not definitely excluded’’ (Fisk & 
West, 1957, p. 6). Blundin’s score 
was recognized as of marginal signifi- 
cance only. 

In a subsequent test with Blundin 
alone, S “became ill and the tests had 
to be abandoned” after more than 
5,000 die throws had been made 
(Fisk & West, 1957, p. 3). While 
Blundin was checking the data analy- 
sis by Fisk, from the original data 
sheets, these ‘‘records were lost in the 
confusion following her entry into 
hospital’ (p. 3). Although the in- 
tended design was a latin square, it 
was incomplete and the situation in- 
volved, even if inadvertently, op- 
tional stopping. 

The next tests with Blundin (Fisk 
& West, 1958) were started when her 
“recovery by the spring of 1955, was 
sufficient” (Fisk & West, 1957, p. 4). 
With a new set of dice, tests were 
carried out on 6 successive daysof each 
week, 48 die throws per day. Fisk 
and West, on alternate weeks, ex- 
posed targets in ignorance of the 
other’s targets. The recording by 
Blundin of her throws were witnessed 
by an observer on about one-third of 
the trials. The targets for 1-6 faces 
were chosen from a table of random 
numbers. The total die throws were 
1,440 with Fisk and 1,392 with West, 
but the authors (Fisk & West, 1957) 
concluded that a significant differ- 
ence between them “has hardly been 
established”’ (p. 4). 

In the final phase 2 years later, the 
targets were mailed to S in sealed 
envelopes, each of which contained a 
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given target and marked alphabeti- 
cally A-F in randomized procedure. 
There were 600 die throws for each 
target and 100 die throws per day. 
“In previous experiments, Dr. 
Blundin had thrown three dice in each 
cast. As she did her own recording, 
generally without witnesses... it 
was thought that by using only two 
dice at a time, likelihood of such 
errors would be greatly reduced” 
(Fisk & West, 1957, p. 5). In this 
series, there was a score of +48 hits 
(p =.032). 

The pooled data for the four 
series are given in our tabulation 
(p=.00017). A technique was de- 
vised of scoring each throw in terms 
of D(ie) O(rientation) to obtain 


quantitative estimates of success by 
grading throws in terms of direct 
hits, 90% off- and 180% off-hit posi- 
tion (cf. Mitchell & Fisk, 1953, 1954) 
and the reported DO scores were 


somewhat better than the hit crite- 
rion. The authors concluded that 
“the results of these five experiments, 
spread over a period of six years [did 
not] provide fairly satisfactory sta- 
tistical confirmation’’ of a PK effect 
(p. 5). Blundin, in another connec- 
tion, offers pertinent sources of error 
in one of the rare introspective re- 
ports of a card-guessing S after she 
became a member of SPR (Blundin, 
1952). In any case, lack of complete 
randomization of targets, absence of 
witnesses, and independently re- 
corded data, were irreparably basic 
defects—subject to no possible sta- 
tistical rectification. 


Forwald 


The result of the first of a long 
series of tests by a Swedish engineer, 
Forwald, was published by Rhine 
(1951). Of the entire series, only one 
of the last involved a substantial 
number of Ss (Pratt & Forwald, 
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1958). For the most part they con- 
stituted a solo series. 

An electrically controlled con- 
tainer released the dice down a run- 
way onto a walled table, S wishing for 
placement into one or the other of the 
equally divided horizontal sections. 
Successive groups of five trials were 
wished for, alternating the A and B 
sides as designated target area in the 
first series, and the ABBA order in 
the second series. In the first series, 
the two areas were divided by ink 
line and in the second series by a 
wire. The pooled data for the two 
series, (A area=+727 hits and B 
area = —227) in Table 4, were given 
a p of .00000006. The scores on two 
respective nonwishing, control series 
were +104 hits for A area and —196 
hits for B area (11,000 die releases). 
Rhine (1951) noted that the per- 
sistent favoring of the A-side was 
“due no doubt partly to the align- 
ment of the dice channel or some 
other lack of structural symmetry in 
the apparatus” (p. 51). 

In the next test, the two place- 
ment areas were divided with a fine 
wire to force released cube to lean 
one way or other (to reduce judg- 
ment factor in scoring) and, with five 
trials as a unit, the order of target 
area was ABBA. Nothing was said 
of target area A-bias, or its possible 
relation to an overall score of +320 
hits on A side and —263 hits on B 
side (Forwald, 1952b, p. 60). With 
respect to the pooled, insignificant, 
deviation of +57 hits (p=.49), ‘‘Un- 
doubtedly the explanation lies in 
some important though subtle, 
change in my motivation” (p. 62). 

For the next report, in each of three 
series, one kind of material (three 
cubes) was wished for and, as a con- 
trol, the other trio of a different ma- 
terial was ignored. The wishing now, 
instead of ‘“‘strong-willed’’ as previ- 
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ously, was in a “calm manner.” 
From here on, only the combined 
score of hits for A and B targets was 
given, and no reference was made to 
A-side bias. U curve of hits previ- 
ously noted failed to appear. The 
pooled data for this study was re- 
ported with a p of .00007 and the 
deviation for the control component 
was insignificant (Forwald, 1952a). 
In view of the variations in condi- 
tions, some reservations areinevitable 
concerning the pooled value for all 
three experiments of +691 hits (with 
a p of 3 in 10,000,000). 

In Forwald (1954b), two colleagues 
were used as Ss. Although the overall 
deviation was insignificant, the two 
Ss showed a significant decline in 
success from first to second half of 
trials during the first part. Appar- 
ently Ss began to lose interest, as 
there accrued an insignificant num- 
ber of hits, and Forwald therefore 
informed them about the decline 
effect to stimulate interest. In the 
final half of the study the significant 
decline failed to appear, and “This 
suggests an inhibiting influence of 
conscious knowledge upon the effects” 
(Pratt, 1955). 

Reverting to solo performance, 
Forwald (1954a) substituted for the 
number of (A and B) area hits, a 
criterion in terms of mean score. The 
entire area was marked off in spaced 
grids, and the position of each cube 
was measured to obtain a mean value 
for the respective A and B target 
areas. The mean scores for the two 
areas both were in the expected 
directions. The falls were photo- 
graphed in one series and sent to 
Duke for independent measurement. 
The score was insignificant, but in 
the first (unphotographed) series 
measured visually by Forwald, the 
difference was significant. 

A comparison was now made be- 
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tween two individuals working alone 
and two other Ss working in competi- 
tion (Forwald, 1955a). The combined 
score for the two Ss who worked in 
competition with each other was 
insignificant but that for the two Ss 
who worked individually was signifi- 
cant. Of all four individuals, the only 
significant score was that of Forwald 
(one of the individual workers) ‘‘act- 
ing as both S and E”’ (p. 46). A con- 
trol series ‘‘showed some relation to 
the PK placement series’ (p. 51). 

The primary consideration in the 
next report (Forwald, 1955b) was the 
relation of cube roughness to first 
throw effect. It is of some interest, 
considering the original A-side bias, 
that smooth and first stage roughness 
cubes scored a (—) minus mean score 
(on A side) whereas very rough cubes 
scored a plus (+) mean deviation (on 
B side). 

In the next solo effort, with respect 
to a consideration of the experi- 


mental and control series, it was 


observed that ‘‘there must be a 
substantial difference .. . when con- 
scious attempts are made to in- 
fluence the dice and when the subject 
is mentally neutral .. . [But] the re- 
sult in the ‘control series’ is signifi- 
cant in itself . . . [so there could be a] 
back-effect on the subject from the 
placement series’ (Forwald, 1957, 
p. 116). 

Forwald then moved on to Duke 
for a “confirmation” (Pratt & For- 
wald, 1958). Fourteen series were 
carried out under the general re- 
sponsibility of Pratt with newly con- 
structed apparatus. Nothing was 
said of tests for side bias and no con- 
trol tests were reported. Five releases 
constituted the experimental unit, 
with successive wishes for the two 
areas. Only the B-A differences were 
reported as means for Throw 1 for 
each unit of five trials and for all 


EDWARD GIRDEN 


trials combined. On the first two 
series alone, the mean scores were 
insignificant; only the score for 
Throw 1 (first 300 trials) reached the 
.01 criterion. In the next section of 
two more series, with “‘P. M., as pas- 
sive observer and second recorder” 
(p. 10), the scores were insignificant 
for the total of 600 cube releases and 
it was concluded that the observer 
“‘seemed to upset the scores’ (p. 9). 

A subsequent test series with PM, 
Forwald acting as S, was positive: on 
all throws, p=.010 and on first 
throws, p=.006. Compared to the 
other staff members, PM apparently 
entered more fully into the spirit of 
the test when she was cosubject, and 
Forwald (Pratt & Forwald, 1958) 
“felt he was able to concentrate upon 
the task to the same degree as when 
working alone” (p. 10). The results 
on five overlapping series each with a 
different S were undistinguished. 

Because of the ‘“‘success of the 
team”’ (p. 10), two final series were 
obtained with PM. Both Forwald 
and PM recorded independently. 
The mean scores were insignificant. 
But the means for the First Throws 
in the penultimate series ‘“‘is the 
largest difference obtained in the 
work at Duke University and is not 
equaled by any of H. F.’s previous 
results’’ (Pratt & Forwald, 1958, p. 
12). The comparable mean for first 
throws on the final series was not 
significant, but the two pooled gave a 
p of .0002. In terms of planned ex- 
perimentation, this study was free- 
wheeling, with no predetermined de- 
sign and no recorders who were ig- 
norant of the wished-for area. In ad- 
dition, the tests were terminated, 
“because H. F. had to return to 
Sweden at that time’’ (p. 12). The 
third McDougall Award for distin- 
guished work in parapsychology was 
given to Forwald in 1959 by the Duke 








Parapsychological Laboratory for this 
article jointly published with Pratt. 
Regardless of the debate concern- 
ing Forwald’s results (cf. Forwald, 
1954c; Nash & Forwald, 1956; Soal, 
1954), in the entire series there 
wasn’t a single well-designed and 
controlled experiment. Photographic 
recording and objective measure- 
ments would be the least require- 
ment. Even in the last study at Duke 
under the manifest supervision of 
Pratt, the several series involved 
optional stopping, poor recording 
technics, and obvious lack of super- 
vision. This undoubtedly constitutes 
the longest streak of reported suc- 
cesses (1951-61) in all of the history 
of the movement and exceeds that of 
any of Rhine’s or Soal’s previous Ss. 
And it is pertinent to ask, as Nicol 
(1954) does with respect to Forwald, 
‘“‘At what stage in the difficult history 
of psychical research it became per- 
missible for sensitives to report their 
own results and expect them to be 
accepted as serious evidence in psy- 
chical research, I do not know. The 
number of such reports has grown 
disturbingly in recent years” (p. 355). 
Similarly, Thouless (1945a, 1945b, 
1951) has been criticized for solo, un- 
witnessed efforts (West, 1954a). 

Yet Forwald’s work began some 9 
years after the publication of “ESP- 
60"’ (Rhine et al., 1940) which with 
Soal and Bateman (1954) are held 
forth by some as the two most con- 
vincing documents in the field. By 
the standards of the 1940 Duke 
ESP-60 all of Forwald’s (1951-1961) 
data would have been unacceptable. 
And these very standards are contra- 
dicted by Rhine (1951) when, in 
publishing Forwald’s first data, he 
says: ‘There is of course justification 
for the solo type of research in para- 
psychology since all other sciences 
use it and owe a great deal to it”’ (p. 
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50). It should be explicitly clear that 
the solo effort in scientific endeavor 
is indeed an honorable practice, but 
only if the conventional scientific 
criteria are applied; e.g., replication 
of reported results. 


DECLINES 


The case is made for PK on the 
basis of two criteria: an excess num- 
ber of hits, and/or a significant de- 
cline in the number of hits. The first 
statement constituted the hypothesis 
for the tests carried out during the 
initial period at the Duke Laboratory 
under Rhine’s leadership. As noted 
earlier, for Rhine (1946b) the basis 
for the PK hypothesis was the con- 
fidence that players often have that 
“they can when ‘hot’ actually influ- 
ence the dice to some extent to follow 
their desires without the use of trick- 
ery of any kind” (p. 7). The second 
criterion—the decline hypothesis— 
was an unexpected result of a post- 
mortem analysis almost exclusively 
of the original studies of the first 
period (Table 1) and some reports 
published shortly afterwards. The 
position was given in three papers 
(Rhine & Humphrey, 1944a, 1944b; 
Rhine, Humphrey, & Pratt, 1945). 
McConnell et al. (1955) agreeing with 
this approach, specifically states that 
“From an operational point of view, 
one might say that the psychokinetic 
effect is those two effects when they 
appear without a known physical 
mechanism to explain their occur- 
rence’’ (p. 269). 

The declines, or ‘“‘data structure 
effects’’ (McConnell, 1957, p. 134), 
are detected by some unit of the 
record sheet of successive segments of 
trials. A variety of analyses have 
been made and McConnell admits 
that some methods of decline analysis 
“have been tailored to fit the in- 


dividual experiment”? (McConnell et 
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al., 1955, p. 272). For him, the most 
suitable is that of quarter distribu- 
tion of the page. 

Essentially, the decline test was 
derived as follows. Assume that 50 
standard runs (i.e., each 24 die 
throws) are recorded in successive 
vertical columns, left to right. If the 
page is now divided into halves hori- 
zontally and vertically, the four 
quadrants are such that the first 
halves of the first 25 runs are located 
in the upper left (first) quadrant and 
the last halves of the last 25 runs are 
recorded in the lower right (fourth) 
quadrant. The lower left (second) 
and the upper right (third) quadrants 
would contain, respectively, the last 
half of the first 25 runs and the first 
half of the second 25 runs. Pooling 
the scores of the respective quadrants 
for a number of studies, the reported 
distribution was such that there was 
a marked decline from Q; (upper left) 
to Q, (lower right) quadrants. In the 
ideal situation there would be a de- 
creasing number of hits, progres- 
sively from Q, to Q,. Instead of the 
QD (i.e., Quarterly Decline) of the 
page, the test could be applied to the 
half page or set (group of runs with- 
out a pause). It was this pooled 
difference between Q; and Q, (i.e., 
decline in hits) which became the 
hypothesis for the second criterion of 
PK. 

Since the decline hypothesis was 
derived largely from the results ob- 
tained with the Early Dice tests, 
these data will be considered separate 
from the Later Dice tests. (The re- 
sults obtained with objects other 
than dice, Table 3, and placement 
series, Table 4, are not suitable to test 
for the lawful declines attributed to 
the dice scores.) This treatment is 
especially called for with respect to 
the decline hypothesis. First, Rhine 
makes the case largely on the basis of 
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the early tests. Secondly, since the 
hypothesis was derived from these 
data, independent evidence is re- 
quired for its confirmation. The 
question is, then, the extent to which 
these declines occur in the early dice 
tests and the extent to which support- 
ing evidence is forthcoming in the 
subsequent tests. 


Early Dice Tests 


The prior analysis in terms of tar- 

get scores readily suggests that the 
disclosed weaknesses in experimental 
design apply with equal force to the 
lawful declines in scoring. Addi- 
tionally, one can anticipate that 
these data would not be suitable for a 
test of the post hoc decline hypothesis 
since the experiments were not de- 
signed for this purpose. And as Pratt 
(1947b) has since noted: 
There are two reasons why the number of 
analyses made in the earlier investigations to 
discover possible psychological factors under- 
lying hit pattern cannot be made... . the tar- 
gets were not randomly distributed in the 
order in which they were used on the record 
page [and] no one was interested in position 
effects at the time the tests were made, and 
the order of targets on the page had no rele- 
vance for the objective of the experiment. The 
sole purpose was to see if a significant total 
score could be obtained under carefully 
guarded conditions which excluded the possi- 
bility of dice bias (p. 198). 

Perhaps the most extensive control 
test of ‘‘wish versus no wish”’ is the 
Frick 60-Dice solo (Rhine & Hum- 
phrey, 1945a). As previously noted 
(Table 1) there was a total of +582 
hits with six-face as target on 2,172 
runs and +576 hits on an equal num- 
ber of runs when wishing for 1’s (or 
wishing for 6’s not to appear). 

Both Frick Series A (experimental 
wishing for 6’s) and B (control) are 
included in the QD analysis (Rhine & 
Humphrey, 1944a). According to this 
report the QD decline on the control 
series was typical and considerably 
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larger than when wishing for 6’s. 
“Both series, concluded H.L.F [rick], 
must therefore have been due to im- 
perfections in the dice favoring the 
six-face’’ (p. 47). Not so, says Rhine, 
for the control series gave a ‘“‘normal 
QD pattern—one more typical, in 
fact, than that of the original Sixty- 
Dice Series!’’ From this it was con- 
cluded that in PK as in ESP, “the 
essential mental act is unconscious”’ 
(p. 48). This interpretation can be 
compared with Frick’s own view that 
the ‘‘dice are crooked’”’ (Rhine & 
Humphrey, 1945a, p. 205). 

Among the other early tests, there 
is to be found no other truly control 
test. In the large Gibson series 
(Gibson et al., 1943, p. 234), a block 
of 1,944 runs was performed with 
targets equalized. A QD analysis for 
this block was not presented. Indeed, 
a very few unwilled control tests were 
made, and no effort was made to test 
them for position effects. When Pratt 
(1947a) reported a nonwishing con- 
trol test, which demonstrated high- 
face dice bias, no analysis was in- 
corporated with respect to declines. 
Thus lacking control tests, the con- 
sequent dependency upon the Prob- 
ability Model demands a rigorous 
control of the known physical condi- 
tions; e.g., random selection of target 
designations, equal representation of 
all faces as targets. 

In the early tests, dice bias was 
clearly manifest, confirming Weldon’s 
much earlier evidence. In these tests, 
Ss were more or less free to choose 
their targets, which largely were 
preferences for high faces. Suppose, 
for example, there is a six-face bias 
for given dice. If in addition, six- 
face is chosen largely in the first part 
of the series and other faces are 
selected as targets later in the test, 
then the situation is ideally designed 
for a decline in hits to occur. 
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In the Gibson ‘‘cage”’ series (Gib- 
son et al., 1944) Faces 4, 5, and 6 were 
used as targets much more frequently 
on the left side of the page and Faces 
1, 2, and 3 proportionately more on 
the right side of the record page. By 
examining the raw data, kindly made 
available to him by Betty Humphrey, 
Parsons’ analysis clearly demon- 
strated that if all faces had been 
used with equal frequency, there 
would have been an incline, rather 
than a decline in hits. And an ex- 
amination of a total of 10 studies in 
which all faces were represented as 
targets indicates ‘“‘no sound evidence 
that a QD decline occurs on all six 
faces.’"? In at least seven important 
American series, the scores were 
markedly bunched up towards the 
start of the page. In the first and 
only successful ASPR study, “more 
than one-third of the entire positive 
deviation obtained in the whole ex- 
periment comes from the first half of 
the first run’’ on each record sheet 
(Dale, 1946, p. 137). This is also of 
some interest with respect to For- 
wald’s placement series in which the 
criterion became the first trial score. 
Such a degree of bunching of the 
anomalous scores into just those trials 
where physical conditions were likely 
to be unsteadiest is strongly sugges- 
tive of a physical condition.’ 


The evidence thus strongly sug- 
gests that the position effect most 
likely was an attribute of dice bias, 


i.e., associated mainly with high 
Of equal significance is the 
lack of correlation between hits and 
declines. The only evidence offered 
for a correlation of position effects 
with hits was found in Pratt's (1947b) 
analysis of the Gibson Machine Series 
(Gibson et al., 1944) which he admits 
“can only be taken as suggestive of 


an increase in the prominence of po- 


faces. 
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sition effects as the total score goes 
up” (p. 202). 


Later Dice Tests 


It is to be noted that, as with the 
early tests, seldom were positive re- 
sults obtained with respect to both 
criteria: number of hits and decline in 
scores. In the first and only positive 
results from the ASPR by Dale 
(1946), with an excess of hits re- 
ported, there was no _ significant 
change in scoring rate. On the other 
hand, in the study by McConnell 
et al. (1955), a small but significant 
QD was reported although the total 
number of hits was within chance 
expectancy. In a number of studies, 
declines of one sort or another have 
been noted, but did not conform to 
the lawful decline. 

Of the later tests, only one de- 
serves attention in terms of the QD 
hypothesis. Overlapping in time with 
McConnell’s (1955) night series, the 
more extensive day series was carried 
out in 1948-50 with 393 volunteer Ss 
(McConnell et al., 1955). For one- 
third of the trials, one pair of dice 
was cup thrown and for the re- 
mainder another pair was machine 
rotated. It is not clear whether the 
machine dice were the same as those 
used for the night (McConnell, 1955) 
series. Targets given by E were ac- 
cepted by 281 Ss and 112 Ss chose 
their own. It was reported that the 
results of the 383 Ss (each of whom 
had a single target) were essentially 
no different from those obtained with 
the 7 Ss who were permitted to 


change targets during the test ses- 
sion. M,S, and P as Es, respectively, 
recorded for 173 Ss, 132 Ss, and 88 
Ss, making hand records of the scores 
which were later matched against 
photographic recordings of the ma- 
chine throws. A sample of 50% of the 
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photographic recordings was incorpo- 
rated in the present brief report. 

As reported (McConnell et al., 
1955), ‘‘the first of the two statistical 
effects constituting psychokinesis 
was not found”’ (p. 271), the devia- 
tion being negligible (—91 hits). 
For the authors, the “half page, 
rather than the page, is the natural 
psychological unit in the present ex- 
periment” (p. 273). The decline 
analysis, incorporating two-thirds of 
all the trials (56,592 initial versus 
equal number of final trials), indi- 
dicated a significant decrease of 
218 hits (CR=2.72). Accepting 
McConnell’s statement on face value, 
it is nonetheless still not entirely 
satisfactory that the targets used 
“form an essentially random sequence 
—with one exception: some target 
numbers were used more often than 
others”’ (p. 274). 

Humphreys (1956), a skeptic, has 
emphasized that the “two interac- 
tions involving halves of the page, 
which signified a break in the experi- 
mental procedure, are smaller than 
one would expect to obtain by 
chance”’ (p. 291) and he concluded 
that the phenomenon was not a 
genuine psychological effect because, 
with experimental conditions held 
constant, significant individual dif- 
ferences were lacking. In his rebuttal, 
in which the disputed interactions 
were judged to be of chance origin, 
McConnell (1958) raised another 
issue in which he argued that Hum- 
phreys had no right to assume that 
the psychokinetic effect, if real, was caused by 
the 393 subjects whom we tested. In our paper 
we said ‘‘The procedure was one in which the 
experimenter was present and aware of the 
target number”... the decline effect may 
have been partially or entirely caused by the 
experimenters. Thus the model which Dr. 


Humphreys uses relates to a hypothesis which 
we could not claim to have tested (pp. 214f.). 


The mean scores for the Ss for each 








of the Es were extremely alike. What 
seems of equal importance is that the 
variance of the groups of Ss for the 
three Es was markedly different. 
Least, and insignificant, decline oc- 
curred for the 173 Ss tested by 
McConnell. When his two assistants 
were Es, there were reportedly sig- 
nificant declines in hits. The main 
source of significance in initial versus 
final decline was the data of the 132 
Ss tested by Snowdon. Concerning 
the possible role of the E, a difference 
arises from a comparison with the 
overlapping night (sleep) series. In 
the latter the results with McConnell, 
as S, were reportedly positive; where- 
as results with Snowdon as S (long 
distance test while in Peru) were 
insignificant (McConnell, 1955). 

Humphreys’ (1956) suggestion 
“that he would like to see an inde- 
pendent replication of the experi- 
ment’’ (p. 290) brings a nod from 
McConnell (1958) who says, how- 
ever, ‘‘not in simple replication, for it 
must be remembered that our work is 
already a validation of similar work 
by others” (p. 215). Yet a replication 
in 1951-52 of the McConnell day 
series with the same Duke apparatus 
by Dale and Woodruff (Murphy, 
1952b) was completely negative with 
respect to hits. This series of 68,208 
die recordings with 108 Ss, was also 
equally negative with respect to de- 
clines in scoring.‘ 

It cannot be overlooked that with 
better control of conditions, in the 
succeeding tests with dice throwing, 
absolute differences were attenuated 
until they became insignificant and 
declines became smaller. If one 
argues on an inferential basis, the 
residuals were most likely statistical 
artifacts as a consequence of minor 
inadequate experimental conditions. 
As McConnell recognizes, the decline 
effect in his data was small and under 
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such circumstances ‘‘Chance can 
never be ruled out as a possibility no 
matter what level of significance is 
accepted”’ (Humphreys, 1956, p. 291). 


DISCUSSION 


For the beginning of the scientific 
controversy concerning ‘‘psychical’’ 
phenomena, a convenient reference 
point is that of the formation of the 
Society of Psychical Research in 
London in 1882 and the subsequent 
organization of the American Society. 
Attention was largely devoted to the 
observational study of spontaneous 
phenomena, such as reported premo- 
nitions, clairvoyant and telepathic 
occurrences, and telekinesis. 

For the young psychologist, and 
most laymen, the story begins with 
the publication of “ESP” (Rhine, 
1934), a summary of the first Duke 
card-guessing tests. (Many are sur- 
prised to discover that card tests were 
carried out long before Rhine’s efforts 
were initiated at the Duke Parapsy- 
chology Laboratory.) This volume 
was characterized as ‘‘one of the most 
important contributions to psychical 
research yet published’’ (Murphy, 
1934, p. 454) and Murphy predicted 
that “the twenty-fifth century will 
give Rhine’s experiments the impor- 
tance they deserve in the history of 
science” (p. 457). 

With this publication by Rhine, 
the controversy was refocused not 
only between Believers and Skeptics 
(cf. Kennedy, 1939; Wolfle, 1938), 
but also among the paramutual inter- 
ests as well. The old school British 
workers, associated with the SPR for 
50 years, were challenged by ESP 
(Rhine, 1934) reporting a “‘large 
number of persons who, it was 
claimed, had guessed consistently 
above chance expectations’ (Soal, 
1947, p. 25). It was particularly 


disturbing to the British workers, for 
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whom the finding of a sensitive was a 
rare event, to face the report that ‘‘at 
least nine or ten gifted subjects were 
discovered within the small com- 
munity of Duke University’’ (Soal, 
1947, p. 25). The older controversy 
needs no retelling here, but the same 
situation obtained with the subse- 
quent PK tests: first reports by Rhine 
of many successful Ss subsequently 
lacked confirmation by British 
workers, and the magnitude of the 
scores was reduced under more care- 
fully controlled conditions. The at- 
tacks from the skeptics were no 
stronger than the pertinent and 
pointed within-group criticisms. 
Rhine admits that Soal has been 
among his most severe critics (Soal & 
Bowden, 1959). Here it is to be noted, 
the demand for replication is con- 
sidered reasonable: ‘‘Positive results 
in card guessing and dice throwing 
have been reported in America on a 
scale for which there is no parallel {in 
England]. If the American claims 
are genuine we should be forced to 
assume that the psychic faculty is 
extremely rare in England compared 
to America’ (Soal, 1948, p. 183). 

With respect to PK, the evidence 
is strikingly clear that to a large ex- 
tent the earlier PK tests were poorly 
designed and badly executed. There 
is complete justification for the judg- 
ment that ‘‘the Duke experimenters 
seem to have fallen into pitfalls that 
an intelligent school boy should have 
avoided”’ (Soal, 1948, p. 185). As 
examination of the data discloses, by 
the simplest criterion of mere replica- 
tion of test conditions, many of the 
tests fall by the wayside. For ex- 
ample, it was noted that time and 
again Ss were free to choose target 
face, which is ‘‘particularly unsatis- 
factory,’’ that equal numbers of trials 
were not obtained with the several 
target faces, that the use of a hand 
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shaken container in a majority of 
these Duke pioneer reports was not 
completely fool proof. With the 
motor driven dice machine (e.g., 
Gibson et al., 1944; Price & Rhine, 
1944), the extent to which the speed 
of the die caster was adjusted to suit 
the S was not reported (cf. West, 
1945). 

The difficulties raised by an experi- 
mental design resting upon theoreti- 
cal rather than empirical probabilities 
need no detailed treatment with re- 
spect to PK. Dice bias was demon- 
strated long ago in Weldon’s long 
series (Pearson, 1900) and presum- 
ably can be avoided by using ‘“‘true”’ 
dice and spaced replacement (Scarne, 
1956). Whatever else is disclosed by 
the entire series of PK efforts, in its 
most favorable light, the high scores 
in hits are attributable to dice bias. 
The only support for the hypothesis 
of decline in scoring is to be found in 
the study by McConnell et al. (1955). 
The latter effect, however, was not 
confirmed in a subsequent repetition 
with the same apparatus (Murphy, 
1952b). The inherent danger in de- 
pending upon the Probability Model 
is illustrated by Oram’s experiment 
of “random” selection of numbers 
from Kendall and Smith’s tables 
(Oram, 1954) which ‘‘provides the 
most significant single QD quarterly 
decline im annals of psychical re- 
search”’ (Nicol, 1955, p. 80). The use 
of a rigid mechanical system such as 
a rotating dice cage (Gibson et al., 
1944; Price & Rhine, 1944), especially 
with the same dice, gives noassurances 
of random distribution of dice faces 
(cf. Brown, 1957). In fact, West 
(1945) warned that there is the 
danger of turning out “repetitions 
that are not properly random”’ (p. 
286). This is especially pertinent 
when the declines are small. Again 
the need of a controlled experiment 
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as the proper corrective is clearly in- 
dicated: a test of declines without 
wishing is still to be carried out. 

In the light of Weldon’s series, it 
was inexcusable to have permitted 
the unrestrained use of high faces or 
sixes almost as the sole targets in the 
quoted one-half million throws—even 
though this was the Ss’ preference 
and it was felt necessary to preserve 
a game-like environment. To date 
(1961), there is no published study 
which has thoroughly incorporated 
the latin square design for target as- 
signment to control for dice bias. 

Additionally, failure to insure ac- 
curacy of scores by photographic re- 
cording or by independent observers 
ignorant of target designations intro- 
duced irreparable weaknesses. The 
sources of recording errors are many 
and motivation may determine the 
kind of error committed. Thus, 
Kennedy and Uphoff (1939), skeptics, 
have given such evidence in regard to 
card guessing studies by showing dif- 


ferences in the type of error depend- 
ing upon whether SS was a Believer or 


Nonbeliever. Kaufman and Shef- 
field, also skeptics, reported similar 
findings with respect to dice throwing 
with Believers tending to make errors 
in favor of PK whereas Disbelievers 
tended to make errors in the opposite 
direction (Anonymous, 1952). 

This of course is one of the basic 
reasons for avoiding solo efforts, let 
alone depending upon the observa- 
tions for untrained individuals. The 
importance of this consideration for 
psychic research was emphasized by 
members of the London Society at 
the turn of the century, and many 
psychologists since (e.g., Coover, 
1917; Kennedy, 1939). One would 
expect it no longer to be an issue. 
But the original dice tests at Duke 
and the continued solo efforts of 
Forwald are no encouragement at all 


383 


in this respect (cf. Forwald, 1959, 
1961). 

As better controls were intro- 
duced, the high scoring was reduced. 
The later, more careful, studies of 
PK showed only chance deviations in 
hits. And it is not relevant to argue 
that there is a statistically significant 
decline in scoring. Rhine’s (1946b) 
statement, that “The best controls on 
the faulty dice hypothesis . . . consist 
of significant differences in scoring 
rate due to certain effects of position 
of the trial in the test sequence”’ 
(p. 9), is simply a profession of strong 
belief, the only effect of which is self- 
convincing. The post hoc interpreta- 
tions by no stretch of the imagination 
constitute “the best controls on the 
faulty dice hypothesis’ nor rectify 
the initial weaknesses in experimental 
design. It is simply a new hypothesis 
requiring new confirmation. Since 
the QD papers, the better designed 
subsequent tests generally have been 
negative in outcome, the only excep- 
tion consisting of the preliminary 
report by McConnell et al. (1955) 
which, as already noted, lacked sub- 
sequent confirmation. 

Where control tests clearly de- 
molish the pre-experimental hypothe- 
sis, appropriate post hoc interpreta- 
tions are offered; the QD of Frick’s 
control series was ‘‘typical’’ but the 
experimental series was not. The 
scores in both series were practically 
identical and highly significant. And 
this in a skeptical S. Although the 
view is maintained that a skeptical 
attitude interferes with performance, 
Frick’s results, notwithstanding, are 
accepted by Rhine as evidence for 
PK because of these position effects. 
Likewise, McConnell (1958) sug- 
gested a post hoc interpretation 
which his experiment had not been 
designed to test. In replying to 
Humphreys’ (1956) criticism of a 
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lack of significant individual differ- 
ences, McConnell (McConnell et al., 
1955) argued that ‘‘the procedure 
was one in which the experimenter 
was present and aware of the target 
number. Thus, the world ‘subject’ 
is used provisionally or in a nominal 
sense’’ (p. 270) and ‘‘we cannot rule 
out the possibility that the decline 
effect may have been partially or en- 
tirely caused by the experimenters” 
(McConnell, 1958, pp. 214f.). A test 
of this question would have required 
that E not be aware of target designa- 
tions. In the nonwishing control 
series by Pratt (1947a)—which con- 
firmed high-face bias in excavated 
dice—he goes so far as to say that: 
In such a control series there is always present, 
however, the theoretical possibility that the 
thrower is influencing the dice by PK whether 
he is consciously setting himself to do so or 
not. This causes no difficulty so long as the 
research is concerned with the problem of 
evidence that PK occurs, for all that matters 
is whether the difference is significant (p. 56). 
This is an application of the Probabil- 
ity Model with a vengeance but, of 
course, the converse is equally ap- 
plicable: no matter how intense the 
wishing, what assurance is there that 
the results are in fact due to wishing? 
The only solution of this difficulty 
would be provided by the inclusion of 
a control series. Should Pratt’s view 
be taken seriously, then of cours: no 
controlled psychological experimental 
test of PK is possible. 

The fact of the matter is that the 
PK data largely are unrelated to 
psychological problems. In the early 
and ancient history of telekinesis, re- 
ports are to be found of individuals 
“‘willing’’ objects to rise, as well as 
other illustrations of mind over mat- 
ter. These reports, mythological or 
not, do suggest psychological prob- 
lems which are experimentally test- 
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able. And the use of dice, because of 
the game-like situation to evoke in- 
terest, is entirely proper. There have 
been suggestions, published and 
otherwise, that a better test would be 
to determine whether a sensitive 
could will the movement of a delicate 
wire: ‘‘Unlike any other force of 
which we have any experience it 
[PK] is more successful in influencing 
96 dice thrown together than a single 
die... [yet] it is incapable of mov- 
ing a delicately suspended needle”’ 
(Soal, 1948, p. 185). No reports of 
such tests have originated from the 
Duke Laboratory but the British 
tests, published and unpublished, 
were uniformly negative (also cf. 
Carrington, 1938). 

Few of the PK reports fulfill the 
basic requirement of a psychological 
experiment. To have psychological 
justification, there must be a con- 
trolled comparison such as wish for 
versus wish against, or wish for 
versus no wishing, or Believers ver- 
sus Disbelievers. This is but the 
simplest of considerations. The few 
studies which involved such a control 
test were negative (e.g., Van de 
Castle, 1958) and the interpretations 
varied from ignoring this outcome to 
gratuitous “explanation” that the 
process was unconscious. As a 
hypothesis to be tested, this alterna- 
tive is also properly acceptable. But 
nowhere were tests made of Ss versus 
no Ss; this hypothesis requires a con- 
trol series without wishing, i.e., no 
knowledge by any one of the targets. 
Evidence of PK as a psychological 
phenomenon is therefore totally lack- 
ing. And this deficiency will persist 


until the effect is produced in the 
presence of a specified psychological 
variable, and the effect does not ap- 
pear in its absence. 
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PROBLEM 


Whereas unrotated factors are 
mere mathematical entities, peculiar 
to a particular matrix, many psychol- 
ogists contend that factors rotated to 
simple structure correspond to either 
dimensions or causal influences which 
have a real existence in the sense that 
they can be recognized independently 
by other approaches (Cattell, 1952). 
However, no general agreement or 
knowledge exists as to what range of 
scientific entities the factors may rep- 
resent. For example, are factors only 
broader concepts into which measure- 


ments can be resolved; or are they 


organizing influences within more 
varied expressions of phenomena; or 
are they causal influences in the gen- 
eral scientific sense? 

Although it is possible to answer 
this question on philosophical grounds 
up to a certain degree, we can be sure 
that the matter is being properly 
answered only when we have empiri- 
cal data from a large number of in- 
stances. A major reason why psy- 
chologists lack dependable informa- 
tion with which to answer questions 
of this kind is that they continue to 
use factor analysis on psychological 
data in which they do not know be- 
forehand what influences are really at 
work. 

Even working with a known struc- 
ture will not throw light on these 
aspects of factor analysis if it is too 
remote from scientific considerations 
in which factor analysis is commonly 
applied. For example, a completely 


artificial model where correlations 
are calculated by back multiplication 
from a postulated structure has some 
place in explaining what factor analy- 
sis does. But this easy model can at 
best only return us the mathematical 
structure which was written down. 
Somewhat more concrete examples, 
like Thurstone’s box problem (1947, 
pp. 140-144) (if, indeed, such an ex- 
periment were to be done on real 
boxes!) and the bottle experiment 
(Barlow & Burt, 1954) have the ad- 
vantage that real experimental errors 
of measurement can be included in 
the analysis. Both of these, however, 
involve static relationships in which 
the factors are all limited to the class 
of physical or spatial dimensions. 
They thus fail to simulate the diverse 
properties and interactive qualities of 
influences in the psychological situa- 
tion. Their principle contribution 
seems to have been to show that fac- 
tor analysis by means of the simpler 
linear approximation can produce 
tolerably clear statements of relations 
and structures that are known to be 
mathematically complex. 
The senior author over the past 
decade has, therefore, set up several 
examples which approach the ideal of 
being: (a) organic, 
dealing with vital, chemical, or be- 
havioral measurements; 


in the sense of 
(b) inclusive 
of experimental error, in a natural 
measurement situation; (c) not arti- 
ficially brought by selection or other 
means to the mathematician’s arti- 
ficiality of orthogonality of factor 
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relationships; and (d) of incontro- 
vertably known factor structure before 
factor analysis begins. These have 
included a study on the growth of 400 
tomato plants under different condi- 
tions of two factors, hydroponic feed- 
ing and light; two studies of 100 cups 
of coffee (Cattel & Sullivan, 1962); 
and, with the second author, the 
study of 80 balls which shortly will 
be described. A survey of experience 
in laboratories and the graduate de- 
partment shows that it is through a 
lack of an available standard, real 
example that many factor analytic 
procedures and stastical tests fail to 
achieve positive evaluation and re- 
vision. 

The ball study, like the other real 
examples mentioned above, provides 
the research worker with a known 
structure upon which various factor 
analytic practices can be tried. For 
example, the study may be used to 
test a proposal for completeness of 
factor extraction and for studying 
the various ways of estimating com- 
munalities; since the number of fac- 
tors is definite and known. Again it 
may be used for evaluating kinds of 
approximations to factor solutions, 
for the relative magnitude and the 
sign of the loadings is known with 
reasonable accuracy. It may also be 
used to test procedures for rotation to 
simple structure, such as new analytic 
techniques, since the correlations be- 
tween the primary factors were con- 
trolled. 

With controlled relations among 
the factors, the ball study is enlight- 
ening in resolving the important issue 
of orthogonal versus oblique factor 
rotations. It is with this issue that 
the present article is primarily con- 
cerned. 

The question of orthogonal versus 
oblique factor axes seemed to the 
senior author to be answered 20 years 
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ago by those same logical and psycho- 
logical considerations as convinced 
Thurstone (1947) that oblique factors 
(including orthogonal factors only as 
a special case) could alone expect 
to correspond to scientific entities. 
These considerations are: 

1. It is unreasonable to expect that 
a great variety of influences operating 
and interacting in the same universe 
would be completely uncorrelated. 

2. The observation that in most 
examples where one in fact knows the 
structure, it is perfectly obvious that 
the factors (even if uncorrelated in 
the population) will tend to be cor- 
related in the given sample in which 
they have to be discovered (e.g., 
length and height in a population of 
boxes). 

3. The experimental finding (Cat- 
tell & Warburton, 1961) that second 
order factors, obtainable only through 
the obliqueness of primary factors, 
tend to reproduce themselves, in 
different matrices, with considerable 
consistency and good psychological 
meaning. 

4. The finding (Cattell, 1957) that 
simple structure oblique factors are 
experimentally replicable with de- 
cidedly greater constancy of pattern, 
across various samples, than are 
orthogonal factors. 

From these and other considera- 
tions Cattell concluded that the goals 
of orthogonal axes and uniquely de- 
termined simple structure are mutu- 
ally inconsistent (except by some 
coincidence). Nevertheless, it is true 
that even in recent years a minority 
of psychologists, including some of 
high reputation, have continued to 
advocate and teach the search for 
replicable, simple structure factors 
while maintaining orthogonality of 
representation. Moreover, they have 
advocated this procedure not as an 
approximation or as a_ distortion 
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justified by the wish to avoid the 
more complicated calculations ac- 
companying oblique factors, but as a 
correct scientific principle. It seems 
desirable to show clearly why free- 
dom for obliquity of rotation is es- 
sential by presenting a concrete il- 
lustration, the ball study. 


BALL STUDY 


Out of more than 150 balls pur- 
chased in local stores, 80 were selected 
to represent the typical range of 
qualities in balls as the term is under- 
stood. The balls ranged in size from 
about 1 inch in diameter to more than 
7.5 inches, and in weight from .1 
ounce to more than 15 ounces. When 
dropped onto a standard rebounding 
surface from a height of 28 inches, 
the range of rebound was from almost 
0 to more than 19 inches. 

The balls were selected primarily 
to cover the range of the attributes, 
size, weight, and height of rebound. 
(Henceforth, we shall use the physi- 
cist’s term, elasticity, rather than the 
height of rebound, but we shall rede- 
fine elasticity to be the ratio of the 
rebound height to the initial height.) 
To get good distributions some of the 
solid wooden balls were hollowed out 
to make them lighter; other balls 
were covered with cloth or with tape 
to make them heavier and at the same 
time to reduce their ability to re- 
bound. Included in the sample were 
small marbles, ping-pong balls, golf 
balls, hollow rubber balls of all sizes, 
tennis balls, baseballs, softballs, cro- 
quet balls, and large playground 
balls.' 


1 A more detailed description of the 80 balls, 
the matrix of measurements of the balls on 
the attributes, an analysis of the 32 variables, 
the product-moment correlation matrix, and 
photographs of the equipment used in the ex- 
periment, can be found in the study by Dick- 
man (1960) listed among the references. 
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TABLE 1 


CORRELATIONS BETWEEN THE ATTRIBUTES 
DrRECTLY MEASURED 
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Size 
Weight 
Elasticity 
Length 
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Next 80 pieces of string ranging 
from 10 to 60 inches in length were 
cut so that the lengths distributed 
normally over the range. The differ- 
ent lengths were assigned to the balls 
in such a way that length would have 
near zero correlations with size, 
weight, and height of rebound of the 
balls. The correlations between these 
four attributes are shown in Table 1 
The strings were attached to the 
balls so that the ball suspended from 
a hook could swing like a pendulum. 

The variables for the ball study 
were designed to relate to four ball 
“‘traits.’’ These were the size, weight, 
elasticity, and string length of the balls 
and will be designated by S, W, E, 
and L. 

The psychologist always selects 
tests from a particular domain of 
interest. In the ball study, this do- 
main was restricted narrowly to the 
four expected physical influences, but 
the possibility existed, as in any 
study, for unexpected factors to be 
discovered. For example, some addi- 
tional factor of surface texture, or 
darkness of color, might have been 
found. The important point is that in 
devising variables such as might have 
interested Newton, we still left our- 
selves in rotation open to the possi- 
bility, within that domain, of an 
infinite number of resolutions unex- 
pectedly different from those derived 
by an analysis of the physical influ- 
ences. 
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Thirty-two tests were designed to 
evoke and to measure the “‘behavior”’ 
of the species of population we may 
call ‘“‘balls-suspended-on-strings.”’ 
The first four items were the diam- 
eter, weight, height of rebound, and 
length of string which with “back- 
stage’’ knowledge one would realize 
were pure factor markers for S, W, E, 
and L. The remaining 28 items may 
be characterized as dynamic, for most 
of them involve the ball, or the ball 
on its string, in motion and in inter- 
action with equipment fabricated for 
thisstudy. For example: 

Item 5. The ball is allowed to roll 
down an inclined plane, a distance of 
4 feet, and the number of ball 
rotations is counted. 

It would be possible to factor only 
the 28 variables with the four pure 
marker variables left out until factor 
resolution had been obtained. Then 
these markers could be introduced 
later as a check on identification. 


Actually, we carried out the analyses 


both with and without the four 
marker variables, but only the full 
32-variable analysis is reported here.? 
To anticipate, it may be mentioned 
that the correlations between the 
factors eventually found (see Table 
5) closely approximate those found 
between the four marker tests. In- 
deed, the correlations between the 
markers were controlled and the 
lengths of string were cut so that 
string length would be nearly orthog- 
onal to the others. 

The variety of these ball behaviors 
was considerable as can be seen from 
the list of variables below. Moreover, 
all of them allow errors of measure- 
ment to occur in a natural setting, 
not dissimilar from the psychological 


2 In a second experiment we carried out the 
procedures without the four marker variables. 
These results are almost indistinguishable 
from the present results. 
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situation. For on two successive 
trials, the ball will not behave in 
exactly the same way. The errors of 
measurement for this study, how- 
ever, were appreciably smaller than 
in the typical psychological study. 
The reliabilities, computed from two 
trials for each of the variables over 
the 80 balls, were in the range from 
.90 to almost 1.00. 


List of Test Items 


1. Diameter of the ball. 

2. Weight of the ball. 

3. Height of rebound of the ball 
when dropped from 28 inches. 

4. Length of string attached to 
the ball. 

5. Number of rotations of the ball 
in rolling down the length of the in- 
clined plane, a distance of 4 feet. 

6. Distance from the eye the ball 
must be held to cover exactly the 
black circle on the wall (experimenter 
stands 5 feet from the circle which 
has a 1-foot diameter). 

7. Diameter of the shadow cast by 
the ball when the ball is placed mid- 
way between the wall and the flash- 
light (light source is 4 feet from the 
wall). 

8. Number of rotations of the ball 
to travel the distance covered by one 
rotation of the automobile tire (28- 
inch diameter). 

9. Number of squares and parts of 
squares covered by the ball if placed 
in the center of the checkerboard (1- 
inch squares). 

10. With the ball at one end of the 
fulcrum and the sliding weight placed 
so that the fulcrum is in balance, the 
distance of the weight from the ball 
is measured. 

11. After the ball rolls down the 
inclined plane and collides squarely 
with the black, cardboard box, the 
distance the box is moved from the 
foot of the plane is measured. 
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12. The ball rolls down the in- 
clined plane and strikes the paddle 
wheel causing it to revolve and the 
number of rotations is counted. 

13. The ball drops 6 inches onto 
the springboard (a wire spring keeps 
the board horizontal) and the num- 
ber of inches the board is depressed 
is measured. 

14. The ball is placed in the net 
beneath one end of the springboard 
and the sliding weight is moved to a 
position where the board is horizon- 
tal; the distance of the weight from 
the end of the board is measured. 

15. The ball is dropped 36 inches 
onto a hard surface and the number 
of rebounds greater than } 
counted. 

16. The ball is dropped 36 inches 
to strike a hard surface inclined at 
22.5 degrees with the horizontal; the 
distance the ball lands away from 
the surface is measured. 

17. A croquet mallet on an axle 
swings down from a 45-degree angle 
and strikes the ball, causing it to roll 
up the inclined plane; the distance 
up the plane is measured. 

18. The ball is dropped 36 inches 
onto a three-sided piece of rebound- 
ing equipment, and the ball rebounds 
three times and rolls across the carpet 
to a stop; the distance of the ball 
from the equipment is measured. 

19. The ball is allowed to drop 
through a piece of rebounding equip- 
ment and the ball will rebound two 
or more times and roll across the 
carpet; the distance of the ball from 
the equipment is measured. 

20. The ball suspended on its 
string from a hook attached to a 
wooden wheel is raised by revolving 
the wheel; the number of wheel 
revolutions is counted until the ball 
is raised to touch the wheel. 

21. The ball is suspended on its 
string from the top of a board with 


inch is 
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knobs which are numbered consecu- 
tively and the string is threaded 
around the knobs; the numbered 
knob closest to the ball is recorded. 

22. The ball is hooked on its string 
behind a roller toy which is pulled 
forward causing the string to wind 
around the toy; the distance the ball 
travels before reaching the toy is 
measured. 

23. A “‘Hickory-Dickory" board 
has a pulley arrangement so that 
when the ball and string is threaded 
around the pulley, as the ball is pulled 
down, a toy mouse moves up the 
board; the number at which the ball 
and mouse meet is recorded. 

24. The ball on its string is al- 
lowed to swing back and forth like a 
pendulum; the number of swings ina 
15-second interval is counted. 

25. The ball is allowed to roll down 
the inclined plane and across the 
carpet until it comes to a stop; the 
distance from the plane is measured. 

26. The ball is dropped 24 inches 
onto a hard surface inclined at an 
angle of 22.5 degrees with the hori- 
zontal, causing the ball to rebound 
and roll across the carpet; the dis- 


tance of the ball from the rebounding 
surface is measured. 
27. The ball is dropped from a 


height of 18 inches into a salad bowl 


filled to the brim with water and 
the water splashed out is collected 
and measured. 

28. The ball rolls down a small in- 
clined channel and travels across a 
narrow board with numbered mold- 
ing strips; the number at which the 
ball comes to rest is recorded. 

29. The ball suspended on its 
string is allowed to swing down from 
an angle of 45 degrees to strike a 
solid rebounding surface; the number 
of rebounds is counted. 

30. The string attacked to the ball 
is wound around the ball’s circum- 
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ference; the number of windings is 
counted. 

31. At a height of 68 inches a 
flashlight beam is directed to the 
floor beneath and the ball suspended 
on its string is hooked below the 
light source; the diameter of the ball’s 
shadow cast on the floor is measured. 

32. The ball suspended on its 
string is allowed to swing down from 
a 45-degree angle to strike a long 
narrow board; the number of inches 
that the board swings is recorded. 


FACTOR ANALYSIS 


The factor pattern to be expected 
for these variables can be predicted 
with fair precision if it is assumed 
that the factors which will be ob- 
tained are in fact S, W, E, and L. 
Then equations from geometry and 
physics relating each test with the 
factors can be solved. These results 
are shown in Table 2. Of course, the 
transition from the real world of 
empirical relations to an abstract 
world of perfect spheres, perfect 
vacua, frictionless surfaces, and error- 


TABLE 2 


PREDICTED LOADING PATTERN 
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Note.—An X indicates a factor loading expected to 
be large in value; a zero indicates a trivial loading. 


RAYMOND B. CATTELL AND KERN DICKMAN 


TABLE 3 
Six LARGEstT LATENT Roots 








Root Size 





13.78 
7.00 
6.46 
1.65 

.88 
47 





less measurements may not always 
be so simple. 

One problem encountered in an 
actual factor analysis is determining 
the number of factors. There are 
differing opinions as to the best pro- 
cedure to follow. For this purpose, 
we used a theorem by Guttman 
(1954) and a formula by Kaiser. 
Guttman has demonstrated that with 
unities in the diagonals, the number 
of latent roots, k, of size 1.00 or 
larger, is a lower bound for the num- 
ber of factors. This is based on his 
proof that it is not possible to find 
proper communalities (values in the 
range from zero to 1.00 such that the 
matrix remains Gramian) which will 
reduce the rank of the matrix to less 
than k. Kaiser has shown that when 
a latent root is smaller than 1.00, the 
alpha-reliability of that principal axis 
factor becomes negative. Thus, by 
combining these two dissimilar ap- 
proaches Guttman’s theorem which is 
algebraic and Kaiser’s formula which 
is in part statistical, k becomes both 
an upper and a lower bound for the 
number of factors. The first six 
largest latent roots from the cor- 
relation matrix are shown in Table 3, 
from which we concluded that there 
are essentially four factors.® 

The next step was to use Thur- 
stone’s iterative procedure (1947, pp. 

*The rationale for the factor analytic 
techniques used in this study is completely 


outlined in Chapter 3 of the study by Dick- 
man (1960). 
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294-295) to calculate communality 
estimates for the four factors. After 
10 iterations the procedure was 
stopped. The estimates had con- 
verged to the point where they were 
stable to two and possibly three 
decimal places. The average change 
for all 32 estimates was less than 
.0005 from the ninth to the tenth 
iteration. Then using these values as 
our communality estimates, we pro- 
ceeded to extract four factors by the 
centroid method (for uniformity with 
most common practice in the research 
to which our generalizations are in- 
tended to apply). 

It was immediately apparent from 
an inspection of the unrotated cen- 
troid factors‘ that they were not in- 
terpretable. Moreover, they do not 
match the structure determined a 
priori by considering the mathematic 
and kinematic relations of the vari- 
ables to the postulated four factors. 
Some rotation procedure is required 
for the factor loadings to make sense. 

Table 4 shows the results of rotat- 
ing the four centroid factors by the 
use of the Varimax criterion (Kaiser, 
1958). Varimax, an analytic proce- 
dure which aims at simple structure 
under the restriction of orthogonal- 
ity, does such a good job when the 
factors are truly orthogonal that fur- 
ther adjustments by graphical means 
are not often required. Unfor- 
tunately, in nature, as was pointed 
out previously, orthogonal factors are 
as uncommon as a straight tree. 

When the simple structure is 


4A table containing nonrotated centroid 
factors has been deposited with the American 
Documentation Institute. Order Document 
No. 6911 from ADI Auxiliary Publications 
Project, Photoduplication Service, Library of 
Congress; Washington 25. D. C., remitting in 
advance $1.25 for microfilm or $1.25 for photo- 
copies. Make checks payable to: Chief, 
Photoduplication Service, Library of Con- 
gress. 
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ORTHOGONAL VARIMAX FACTORS 
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oblique, an orthogonal analytical 
search program such as Quartimax or 
Varimax will usually not fit hyper- 
planes to any of the clusters of points, 
but instead, like Buridan’s ass, will 
be unable to choose between them. 
The primary factors for size and 
weight happen to correlate in the 
vicinity of .70 despite the selection 
of hollow and solid balls of dense and 
not so dense materials. Figure 1 
which shows the reference vector 
structure plot for size and weight 
illustrates how impossible it is to get 
a good orthogonal fit under these 
circumstances. Here it is clear that 
if the hyperplane for weight is placed 
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Weight 





Fic. 1. Reference vector structure; size and weight (intermittent lines are hyperplanes). 


where it must obviously go, then the 
hyperplane for size cannot attain 
simple structure if placed orthogonal 
to it—or vice versa. 

Only in the case of the string 
length factor would orthogonality be 
a reasonable approximation, for it 
will be recalled that length was in 
fact controlled to be nearly orthog- 
onal to each of the other three fac- 
tors. It can be seen from an inspec- 
tion of Table 4 that, in this case only, 
the orthogonal factor is immediately 
interpretable and meaningful. How- 
ever, the marker variables as well as 
the clusters of variables associated 
with size, weight, and elasticity are 
also quite substantial in their load- 
ings on other factors and it is only 
when one considers their relative 
positions that the danger of a wrong 
interpretation appears. 

Table 5 shows the results of ro- 
tating to an oblique simple struc- 


ture. This was accomplished first by 
applying an analytic criterion, Bi- 
normamin (Kaiser & Dickman, 1959), 
which aims at oblique simple struc- 
ture, and then by making final ad- 
justments from an inspection of the 
graphs of the factor pairs. Only 
one adjustment was required. In any 
case the objectivity of a simple struc- 
ture solution does not depend on the 
extent to which a machine replaces a 
psychologist’s judgment, but on an 
independent test of goodness of sim- 
ple structure. 

Simple structure as defined by 
Thurstone’s five rules (1947, p. 335) 
has a large number of near zeros, say 
values in the interval from —.10 to 
+.10, in the rows and columns of 
the reference vector structure. These 
near zeros define the hyperplanes 
and the intersections of the planes 
are the primary factors. In Table 6 
are shown the number of zeros in 
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each column for the unrotated cen- 
troid factors, for the orthogonal vari- 
max factors, and for the oblique ref- 
erence vector structure obtained by 
the combined binormamin-graphical 
method.’ Except for the L column 


’ A near zero is counted only if the value 
divided by the square root of the test com- 
munality lies within the interval from —.10 to 
+.10. 
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which, as stated, happens to be 
orthogonal to S, W, and E, the 
oblique factors are clearly superior 
to the orthogonal ones, by this cri- 
terion of attainment of simple struc- 
ture. 

By using Bargmann’s test (1955) 
for the statistical significance of sim- 
ple structure, the probability that a 
hyperplane count is a chance oc- 


TABLE 5 


Reference Vector Structure 
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Primary Factor Pattern 
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TABLE 6 
HYPERPLANE COUNTS 








S W E L 





Centroid 0 1 1 3 
Varimax 9 8 10 21 
Graphical 18 19 21 21 





Note.—Near zeros in the reference vector structure 
columns, 


currence can be determined. If the 
probability is low, say less than .05 
or .01, then the null hypothesis can 
be rejected, and it is concluded that 
the structure is inherent in the data 
and that the rotation procedure has 
found this inherent structure. 

For this study, a hyperplane count 
of 11 or more indicates stability at 
level p <.05, and if the count is 12 or 
more, the factor column is signif- 
icantly stable at level )<.01. If the 
Bargmann test were to be rigorously 
applied to the orthogonal factors, 
one would be unable to reject the null 
hypothesis that the zeros in the S, 
W, and E columns could occur by 
chance. On the other hand, the high 
hyperplane counts of 18 and more for 
the oblique solution indicates ex- 
tremely significant results. The size 
of the critical region for a count of 18 
is approximately 5 X107". 

Factor analysis is a procedure for 
expressing the relations of a battery 
of tests with a set of hypothetical 
factors. There are many uses for 
factor analysis, and there may be 
many situations where an investiga- 
tor may prefer to express these rela- 
tions upon a set of independent factor 
axes. In these situations, he should 
choose orthogonal axes, and Barg- 
mann’s test may be inappropriate. 
But if the investigator intends to 
interpret the factors in terms of 
common variance with the test vari- 
ables, then he should aim at simple 
structure. We have presented evi- 


dence to show that these hypothetical 
constructs, the factors, relate more 
closely to natural phenomena when 
an oblique simple structure solution 
is obtained. 


SUMMARY 


1. With the aim of throwing light 
on the nature of the concepts gen- 
erated by factor analysis, through 
testing the method on a known phys- 
ical model, some 32 properties or be- 
haviors were measured for 80 balls 
varying in size, weight, elasticity, 
and the length of string on which 
some of their “performances” were 
measured. The variables were inter- 
correlated and factored. 

2. Applying the usual procedures 
in well controlled psychological use 
of factor analysis, we found: (a) that 
four factors were indicated as the 
appropriate number, which checks 
with the number of influences a 
physicist would posit in this situa- 
tion; (6) that a simple structure of 
high significance, by Bargmann’s 
test, existed in the data and was at- 
tainable by oblique rotation but not 
by orthogonal rotation; (c) that the 
factors obtained, recognized by unit 
loadings on some measures of size, 
weight, elasticity, and string length, 
proved to be precisely the four in- 
fluences which were predicted from a 
mathematical and kinematic analysis 
of the variables. 

3. The attempt at orthogonal sim- 
ple structure rotation failed on three 
counts: (a) it showed a tendency to 
fit hyperplanes to none of the clusters 
of points; (6) it did not yield a 
statistically significant structure 
(Bargmann test) in the best position 
it could attain; (c) except in the case 
of string length, which happened to 
be orthogonal, the interpretation of 
the factors cannot be so clearly 
made as in the oblique case. 
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Elasticity 





Fic. 2. Primary factor pattern; size and elasticity (intermittent lines are orthogonal axes). 


The last point deserves some ex- 
pansion. In Figure 2 is shown the 
factor pattern plot of size against 
elasticity, drawn on oblique axes 
with a cosine of .21 (see Table 5). 
Evidently, there is some tendency 
for larger balls to be more elastic, 
perhaps because they are more fre- 
quently pneumatic. From this dia- 
gram two summarizing points can be 
illustrated as follows. 

4. If one insists on erecting the 
axis for the second factor at a right 
angle to the first, its meaning be- 
comes a composite—a highly acci- 
dental composite—of the two in- 
fluences that have been clearly iso- 
lated by the oblique simple structure. 
Thus, in Figure 2 the variables close 
to the S axis (but not including the 
cluster about the origin), which are 
in meaning practically pure measures 
of size, would become, against the 


orthogonal factors shown in the in- 
termittent lines, a composite struc- 
ture. Similarly, those points close to 
E would also become composite if 
related to the orthogonal lines. Forc- 
ing orthogonality thus means creat- 
ing conceptual hybrids which are 


neither one or the other. The prac- 
tice of completing psychological re- 
search by mechanically applying 
orthogonal analytic computer pro- 
grams, when there is no prior knowl- 
edge that the factors are in the rare 
orthogonal relationship, is therefore 
likely to lead to a crop of misleading 
psychological conclusions. Consider- 
able visual single plane rotational 
adjustment is necessary to complete 
such research and inspection of factor 
plots with subsequent adjustments 
may be necessary after oblique pro- 
grams such as Oblimax or Binorma- 
min are applied 
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5. Psychologists who presume 
orthogonality sometimes say that 
they do so because, although factors 
may be oblique in the sample due to 
sampling errors, the factors must es- 
sentially be orthogonal in the popula- 
tion. The latter is a false assumption. 
There is no reason why in our in- 
teracting universe they should be 
correlated in the population; e.g., 
as age, weight, and stature factors 
are in the human population. The 
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only place in which these influences 
are uncorrelated is as abstract con- 
ceptions in the scientist’s head. But 
if he wishes to obtain the purest and 
most accurate concepts of them, he 
must respect their obliquity where 
they actually occur. For if he insists 
on finding statistical independence 
for the entities which he believes 
exist, just because he thinks of them 
with conceptual independence, he 
may never find them at all. 
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A half century ago, Mott (1910) 
published a statistical compilation of 
pairs of relatives admitted to London 
County Council mental hospitals and 
noted a preponderance of female 
pairs as against male pairs, and of 
same-sexed as against opposite-sexed 
pairs of relatives. Myerson (1925) 
made a similar study in Massachu- 
setts’ Taunton State Hospital and 
replicated Mott’s findings. Penrose 
(1942) postulated two autosomal 
genes to account for these different 
concordance rates regarding mental 
illness in the two sexes. His theoret- 
ical orientation stemmed from 
Freud’s observation that sexual in- 
version predisposed to mental illness, 
as illustrated in the case of Schreber 
(Freud, 1911). 

Penrose’s postulated Genes A and 
B were assumed to augment and ac- 
centuate femaleness and maleness, 
respectively. If a father had Gene A 
and passed it on to his son, both 
would have the inherited tendency 
to sexual inversion which would pre- 
dispose each to mental illness. If 
Gene A, however, was passed on toa 
daughter, her femininity would be 
enhanced or exaggerated, and she 
would escape the tendency to sexual 
inversion. Similarly, if mothers 
passed on Gene B to their daughters 
both would tend toward sexual in- 
version whereas their sons inheriting 
Gene B would be exceedingly mascu- 
line and would be less prone to de- 
velop mental illness. 

Slater (1944) believed that higher 
familial concordance in same-sexed 
as against opposite-sexed pairs had 


to be accepted as a well-established 
fact, but he objected to Penrose’s 
theory to account for it. He pointed 
out that expressivity and penetrance 
were influenced by genetic, bio- 
chemical, and environmental factors, 
and that it was only necessary to 
suppose that some of these factors 
were more likely to be the same for 
persons of the same sex than for 
persons of the opposite sex. How- 


ever, he made no attempt to indicate 
what these other genetic, biochem- 
ical, or environmental factors might 
be, so that his theory lacked the rela- 
tive specificity of Penrose’s theory. 
Slater (1953b) subsequently con- 
sidered the possibility of a sex-linked 


recessive gene in mental illness. He 
analyzed the frequencies of same- 
sexed and opposite-sexed pairs among 
the avuncular and sibling relation- 
ships in Mott’s series of cases, and 
concluded that a genetic explana- 
tion was not satisfying because of 
inconsistencies in the. frequencies 
when both kinds of familial rela- 
tionship were considered. Also, if 
mental illness were due to a sex- 
linked recessive gene, an excess of 
maternal uncle-nephew pairs should 
have been found as compared to all 
other avuncular pairs, but no such 
excess occurred. However, he still 
considered the possibility that a sex- 
linked recessive gene might be play- 
ing a causative role with respect toa 
tendency for mental deficiency and 
paranoid schizophrenia to occur in 
the same families among Mott's 
cases. 

Rosenthal (1959), in his analysis 
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of Slater’s (1953a) schizophrenic twin 
series, pointed out several differences 
between the sexes with respect to 
clinical features of the illness asso- 
ciated with concordance. Premorbid 
history, age of onset, deteriorative 
outcome, and subtype diagnosis dis- 
tinguished concordant from dis- 
cordant monozygotic (MZ) twins if 
they were male, but not if they were 
female. Attention was also called to 
the greater number of index cases 
and the higher concordance rate 
among female as compared to male 
MZ twins in Slater’s series. 

In a subsequent paper, Rosenthal 
(1961) noted that the preponderance 
of female twins obtained in the three 
largest of the five major twin studies 
of schizophrenia resulted from sam- 
pling among resident hospital popu- 
lations. Females were more likely to 
become inhabitants of the chronic 
wards than males. It was also 
pointed out that a sample obtained 


in large part from a chronic popula- 
tion showed higher concordance with 
respect to severity of the illness than 
a sample obtained primarily through 
consecutive admissions, so that rela- 
tionships found between sex and 


concordance rates might be _ in- 
fluenced by the kind of sampling 
procedure used. 

Thus, Mott’s findings of half a 
century ago remain without any 
satisfying explanation, and the issues 
surrounding them seem to be more 
complex than originally believed. In 
this paper, it is intended to bring to- 
gether various findings in the litera- 
ture which are relevant to the theoret- 
ical considerations stated above, 
and to indicate the conclusions to- 
ward which these findings, in the ag- 
gregate, seem to be pointing. I shall 
attempt to determine whether higher 
familial concordance for females 
than males, and for same-sexed than 
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opposite-sexed members, obtains 
through all close blood relationships. 
I shall consider various possible ex- 
planations of these findings, trying 
to indicate the relative merits of 
genetic and environmentalist hypoth- 
eses intended to account for the 
findings. I shall focus on these mat- 
ters with respect to schizophrenia 
primarily, and shall use them to 
illuminate some obscure aspects of 
the etiology of this mental disorder. 
FINDINGS 

The closest familial relationship, 
probably in the interpersonal as well 
as the genetic sense, is to be found 
among MZ twins. There are, of 
course, no opposite-sexed MZ twins, 
but concordance rates can be com- 
pared for MZ male and female pairs. 
Of the five major twin studies which 
aimed at statistically representa- 
tive samples, four (Essen-Méller, 
1941; Luxenburger, 1928; Rosanoff, 
Handy, Plesset, & Brush, 1934; 
Slater, 1953a) gave breakdowns of 
their MZ twin series by sex, and one 
(Kallmann, 1946) did not. All con- 
cordant and discordant MZ pairs re- 
ported in the four studies have herein 
been summed according to sex, the 
combined figures shown in Table 1. 

It can be seen from Table 1 that 
concordance rates are indeed higher 
for female than for male MZ twins. 
All four studies are in agreement on 
this point, although none reaches 
statistical significance by itself be- 
cause of the small numbers of cases. 
Indeed, the discordance rate is more 
than twice as high for the male as 
for the female MZ twins, represent- 
ing almost half of all male pairs, but 
less than one in four female pairs. 

This marked sex difference does 
not occur in Kallmann’s series, al- 
though the very slight difference in 
concordance rates which he reports 
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TABLE 1 


NUMBER OF MALE AND FEMALE Pairs OF MonozyGotic Twins CONCORDANT AND 
DISCORDANT WITH RESPECT TO SCHIZOPHRENIA 


(Four studies) 








SEX 





Male Percent 


Female 


Percent 
Percent 





47 
13 
60 


Concordant 
Discordant 
Total 


(78.3) 
(21.7) 








23 
19 
42 


(54.8) 
(45.2) 


(68.6) 
(31.4) 








Note.— x? =5.328; at p =.02, x* =5.412. 


is also in the direction of higher con- 
cordance among female MZ pairs. 

When we attempt to analyze the 
concordance rates of male and female 
dizygotic (DZ) twins, we find that 
only the series of Rosanoff et al. and 
Slater can be combined statistically. 
Kallmann does not give the actual 
number of same-sexed male and fe- 
male DZ pairs. Luxenburger in a 
later report (1930) still had not 
found any concordant DZ pairs in his 
series of 37 DZ index cases. Essen- 
Miller did not concern himself with 
or report on his opposite-sexed pairs. 
The relevant data from the Rosanoff 
and Slater studies are summarized in 
Table 2. 

It can be seen from Table 2 that 
there are only 28 concordant DZ 
pairs in both studies combined. 
When these are divided into three 
sex categories, the Ns for analysis 


are rather small. Nevertheless, it is 
clear in Slater’s series that the con- 
cordance rate for female DZ pairs is 
higher than the rate for male DZ 
pairs, and that same-sexed pairs are 
more concordant than opposite-sexed 
pairs. In the Rosanoff et al. study, 
the concordance rate is higher for 
male DZ pairs than for female DZ 
pairs. There are only 11 male pairs 
in their series as compared to 42 
female and 48 opposite-sexed pairs, 
suggesting an unusual degree of 
sampling bias with respect to the 
male pairs. It will be noticed that 
the female: male ratio in the Rosanoff 
et al. study is approximately 4:1, a 
most deviant distribution, while in 
the Slater study the ratio is approxi- 


1 This is the only instance in the literature 
where, for any given blood relationship, I have 
found a higher concordance rate for male than 
female pairs with respect to schizophrenia. 


TABLE 2 
CONCORDANCE WITH RESPECT TO SCHIZOPHRENIA IN MALE AND FEMALE DizyGotic Twins 








Sl 
Sex of later* 


Rosanoff et al.> Total 


| 





airs ; 
P Concordant | Discordant 


‘ es 
Concordant | Discordant] Concordant | Discordant 





) 19 
Zo) 31 
) 52 


Male-male 
Female-female 
Male-female 








2( 9. 
9 (22. 
2( 3, 


3 (27.3%) 8 
7 (16.7%) 
5 (10.4%) 


| 5 (15.6%) 
16 (19.3%) 
7( 6.9%) 


27 
67 
95 


35 
43 





Note.—From Slater (1951) and Rosanoff, Handy, Plesset, and Brush (1934-35). 


® x? =8.18, p <.02. 
b y2=2.21, p >.30. 
© x? = 6.50, p <.05. 
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TABLE 3 


INCIDENCE OF CASE HISTORIES SUGGESTING 
‘*TRAUMATIC OR INFECTIOUS ETIOLOGY” 
IN ALL TwIN Patrs DISCORDANT AS 

TO SCHIZOPHRENIA 


Sex of the 
schizophrenic 


history twin 





Male 


Female 


History of 
trauma or 16 
infection (41.0%) 

History nega- 
tive for 
trauma or 
infection 23 54 77 

Total 39 60 99 














Note.—Data from Rosanoff et al., 1934. x?=11.43, 
?<.01. 


mately 2:1, which also deviates 
markedly from theoretical expect- 
ancy. Some reasons for and implica- 
tions of such sampling biases have 
been discussed at length by Rosen- 
thal (1961). 

Nevertheless, the concordance rate 
is again higher for same-sexed than 
opposite-sexed pairs in the Rosanoff 
et al. study, although the difference 
does not reach statistical significance. 
Combining both studies (which at 
least provides larger Ns for each 
category, and which perhaps cancels 
out some of the sampling errors of 
each study) yields statistically signif- 
icant differences between the dif- 
ferent types of sex pairs. The con- 
cordance rate is somewhat higher for 
females than males, and the concord- 
ance rate for same-sexed pairs is 
18.3% as against 6.9% for opposite- 
sexed pairs, an appreciable dif- 
ference. 

Since twins comprise a special and 
unusual kind of familial group, we 
might pause in our investigation of 
familial concordance among the sexes 
to ask if there is any information 
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available relevant to twins which 
might account for the sex differences 
in concordance reviewed here. One 
finding worth noting was reported by 
Rosanoff et al. They examined the 
histories of all twin pairs, both MZ 
and DZ, in which only one twin was 
diagnosed schizophrenic, and _ ob- 
served that the proportion of affected 
cases having a “probably traumatic 
or infectious etiology’’ seemed to be 
higher in the males than in the fe- 
males. The relevant data are brought 
together and summarized in Table 3. 

It can be seen from Table 3 that 
clinical histories suggesting traumatic 
or infectious etiology occurred about 
four times as often among the male 
as among the female schizophrenic 
twins. The sex difference in this re- 
spect is highly significant, statis- 
tically. Hereditarily speaking, sucha 
finding is consistent with the hypoth- 
esis of genetic predisposition in the 
affected twin, and one could infer 
that manifestation of the _ illness 
might not have occurred without the 
trauma or infection, the latter serv- 
ing as environmental factors which 
fostered and abetted manifestation. 

The following considerations are 
raised by such an hypothesis: 

1. Has it been shown in any study 
heretofore that physical trauma of 
infection are etiologic with respect to 
schizophrenia generally? Reviews of 
the literature relevant to this point 
indicate that, aside from a number of 
isolated case reports, there is no evi- 
dence to support this view (Bellak 
1947; Kety, 1959; Overholser & 
Werkman, 1958). We must therefore 
question why such factors would be 
relevant to twins but not to single 
born persons. Should such a finding 
be confirmed with nontwins, the hy- 
pothesis would be bolstered consider- 
ably. 


2. Such a finding really explains 
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higher discordance among male pairs 
rather than higher concordance 
among females. In other words, the 
number of concordant pairs should be 
about the same for both sexes, but 
the number of discordant pairs 
should be greater among male twins, 
if the hypothesis is correct. The data 
in Tables 1 and 2 suggest that it is 
rather the higher frequency of con- 
cordant pairs among females which 
is distinguishing the two sexes. There 
may however be some sampling bias 
involved here, as noted above, so 
that the hypothesis cannot be dis- 
credited with complete assurance on 
this point alone. 

3. The authors do not mention 
whether the frequency of trauma 
and infection was similarly or dis- 
similarly distributed among the non- 
affected twin partners. Presumably, 
they did not have this information 
since it would be so hard to come by. 
It would be reasonable to assume 
that physical trauma, if not infec- 
tion, occurs more frequently among 
males than females generally. If this 
is true, one would suppose that 
trauma and infection occurred more 
frequently among the nonaffected 
male twins than among the non- 
affected female twins. Moreover, the 
probably higher incidence of trauma 
among male twins would also imply a 
higher incidence of schizophrenia 
among males than among females 
generally. Although there is some 
evidence to show that such a dif- 
ference does occur (Landis & Page, 
1938; Malzberg, 1935), the difference 
is not large and probably not at all 
proportional to the difference be- 
tween the sexes with respect to the 
frequency of trauma. 

Thus, although the Rosanoff et al. 
hypothesis is a suggestive one, there 
is little to support it and some evi- 
dence against it. Another kind of in- 
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TABLE 4 
INCIDENCE OF ““ETIOLOGICAL PSYCHIC 
FACTORS” IN THE CLINICAL HISTORIES 
OF MALE AND FEMALE DISCORDANT 
SCHIZOPHRENIC TWINS 


Male Female 


Psychic factors in 
evidence 21 
Psychic factors not in 
o re ~ } 2¢ 
evidence 3. 39 


Note.—Data from Rosanoff et al., 1934. x? =3.65; 


at p =.05, x? =3.84. 


formation relevant to the main prob- 
lem under discussion can, however, 
also be found in their study. The 
authors searched the same clinical 
histories of the affected twins among 
discordant pairs for evidence of 
“etiological psychic factors."” The 
relevant data are summarized in 
Table 4. 

Table 4 that etiological 
psychic factors were in evidence 
among 35% of the affected female 
twins but among only 15.4% of the 
affected males. The difference falls 
barely short of the 5% level of 
significance. The authors state that 
it was especially those psychic factors 
having to do with the sphere of sex 
or love life which were more common 
in the female sex. If such ‘“‘etiolog- 
ical” factors are more common among 
female twins, then they would be 
more likely to occur among both 
members of a pair of female as com- 
pared to male twins, and would 
thus make for a higher concordance 
rate for the female twins. 

Against this hypothesis, it may be 
said that we do not know about the 
distribution of these factors in the 
nonaffected twin partners. One 
could well imagine that a similar dis- 
tribution might have obtained among 
them, since problems around sex and 
the love life probably were at that 


shows 
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TABLE 5 


CONCORDANCE AS TO SCHIZOPHRENIA IN SAME- 
SEXED AND OpposiITE-SEXED 
Dizycotic Twins 








Discordant 
pairs 


Concordant 


pairs Total 





Same sex 34 
Opposite sex 13 
Total 47 


262 296 
208 221 
470 517 








Note.—Data from Kallmann, 1946. x? =4.15, » <.05 


time (and perhaps still are) more prev- 
alent among women than men 
(Offergeld, 1957). Moreover, one 
would normally have some reserva- 
tions about inferences regarding psy- 
chological problems when these in- 
ferences are made from _ hospital 
records. It may be simply that the 
women were less loath to talk about 
psychological problems, especially 
sexual ones, than were the men. 


Gross (1959) found that women ad- 
mitted to having psychopathological 
traits more freely than men, while 


men showed more conscious and un- 
conscious denial and constriction of 
behavior. Moreover, if such prob- 
lems are crucial etiologically with 
respect to schizophrenia, and if they 
occur more commonly among women 
than men, then the incidence of this 
disorder should be higher among fe- 
males. As noted above, frequency of 
hospitalization for schizophrenia in 
this country is, if anything, higher 
among males. 

In favor of the hypothesis is the 
clinical observation, so often reported 
that it is needless to cite references 
here, of morbid sexual preoccupation 
among schizophrenics of both sexes. 
Moreover, it is well established that 
good premorbid sexual adjustment 
is a favorable prognostic sign (Phil- 
lips, 1953; Wittman, 1941). But even 
if psychological disturbances in the 
erotic sphere could be proved to be 
of etiological significance in schizo- 
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phrenia, it would still be necessary 
to integrate theoretically the findings 
in Table 4 with data which suggest a 
probably higher incidence among 
males generally but a higher con- 
cordance among female twins. 

Kallmann (1946) has published 
data regarding the number of same- 
sexed and opposite-sexed DZ twin 
pairs concordant with respect to 
schizophrenia. These data are sum- 
marized in Table 5, in which it is 
again shown that the concordance 
rate is higher for the same-sexed 
pairs. 

Kallmann also stated that the con- 
cordance rate was 17.7% for female 
DZ pairs and 17.4% for male DZ 
pairs, as shown in Table 6. It is dif- 
ficult to evaluate these figures since 
there were only 34 concordant same- 
sexed DZ pairs in his series, and since 
the percentages cited represent mor- 
bidity risk estimates rather than ac- 
tual concordant pairs, as in Table 2. 
Kallmann also published concord- 
ance rates (morbidity risk _ esti- 
mates) for the siblings of twin index 
cases, as shown in Table 6. When 
the sibling was of the same sex as the 
twin, the concordance rate was 
16.1%. When he was of the opposite 
sex, the rate was 12.3%. Among 
those siblings who were of the same 
sex as the twin index cases, the con- 
cordance rate was 16.3% for females 
and 15.9% for males. Although 
again we cannot infer the number 
of actual pairs, and the numbers are 
obviously small, the slight differences 
reported are in the expected direc- 
tion. In Slater’s series, which had 
about two females for every male 
index twin, the age-corrected inci- 
dence of schizophrenia among sibs 
of the index cases was 3.4% for male 
sibs and 7.3% for female sibs, a find- 
ing which is again in the direction of 
higher concordance among females. 

Concordance in sibling pairs where 
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TABLE 6 


Morsipity Risks AS TO SCHIZOPHRENIA IN SIBLINGS AND DizyGotic COTWINS 
OF TWIN INDEX Cases ACCORDING TO SEX 








Sex of the 


twin index 


Siblings of twin index cases 


Dizygotic cotwins 





case Male Female 


Total 


Total 


| 
| Male 


| Female 





Same sex 
Opposite sex 
All cases 





16.1 
12.3 
14.3 


17.4 | 
10.5 | 
14.3 | 





Note.—Data from Kallmann, 1946. 


neither is a twin has been reported in 
two extensive investigations by 
Schulz (1932) and Zehnder (1941). 
The relevant findings are summarized 
in Table 7. 

On the basis of sex distribution 
alone, we expect that the number of 
same-sexed pairs should be approxi- 
mately equal to the number of op- 
posite-sexed pairs. If we combine the 
figures from both studies (which are 
in general agreement on this point) 
we find 138 same-sexed pairs as 
against 85 opposite-sexed pairs. The 
discrepancy differs significantly from 
the expected equal distribution. 
With respect to male versus female 
pairs, we again find a higher number 


TABLE 7 


SEX OF SIBLINGS CONCORDANT WITH 
RESPECT TO SCHIZOPHRENIA 








Schulz 
(1932)* 


Zehnder 


Sex (1940) 


Total 





37 

38 

56 
131° 


13 
30 
29 
72 


50 
68 
85 
203 


Male-male 

Female-female 

Male-female 
All pairs 








Note.— x?, same- versus opposite-sexed pairs, =5.04, 
? <.05, (using 1:1 expectancy). 

® Schulz’s sample is based on siblings of a series of 
index cases collected by Riidin at the University of 
Munich Psychiatric Clinic, whereas Zehnder simply 
ascertained pairs of affected sibs, without specifying 
either as an index case. From a genetic viewpoint, the 
former procedure is desirable. However, the studies 
agree well with respect to the proportions of same- 
sexed to opposite-sexed pairs. 

bf sibs had been included of whose diagnosis 
Schulz was less certain, the figures would have been 43, 
47, and 71 for the brother, sister, and brother-sister 
pairs, respectively. 





of female than male pairs in Zehn- 
der’s sample. If we assume an ex- 
pected equality of pairs by sex, then 
the discrepancy from equality is 
statistically significant (p<.05). In 
the Schulz sample, however, equality 
appears to be almost perfect. But in 
this study, there were actually 367 
male index cases as against only 293 
female index cases in the original 
sample, or 55.6% male versus 44.4% 
female. Moreover, male index cases 
had 1079.5 sibs as compared to 880 
sibs of the female index cases, which 
would have increased the probability 
of having more brother pairs relative 
to sister pairs. But among pairs of 
siblings, the females still comprised 
over 50%. Thus, a difference of un- 
known magnitude in favor of female 
pairs occurred in the Schulz study as 
well. 

It is relevant to point out at this 
time how sampling procedures may 
by themselves lead to such dif- 
ferences. Zehnder, for example, col- 
lected all pairs of siblings admitted to 
a single Swiss mental hospital over a 
20-year period. This sounds like a 
total sample which should therefore 
be free of biasing factors. However, 
we need only to consider that males 
are generally more migrant than 
females, and that the differential was 
probably at a peak during the early 
years of this century when most of 
these cases were admitted. If two 
brothers were both schizophrenic, 
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but one had emigrated before his 
illness, this pair would not have been 
included in Zehnder’s sample. Sis- 
ters, however, who would probably 
both have been relatively more 
housebound,. would have found their 
way to the same hospital if both had 
become ill. 

In this regard it is interesting to 
note that there were 50 male pairs 
as compared to 85 opposite-sexed 
pairs in the Schulz and Zehnder 
studies combined. These values do 
not differ significantly from the ex- 
pected values of 45 and 90 pairs for 
the respective groups. However, the 
68 female pairs are significantly 
greater than one-half the number of 
opposite-sexed pairs. Thus, in com- 
paring same-sexed with opposite- 
sexed pairs, it is the group of female 
pairs which makes for the higher fre- 
quency of concordant siblings among 
the same-sexed pairs. The possibility 
also exists that same-sexed siblings, 
male or female, are more likely to be 
close to one another and live in the 
same area than are opposite-sexed 
siblings. Such factors alone, if 
proven correct, could account for 
the data reported in Table 7. 

With respect to research in this 
area, two points are worth making. 
The first follows directly from the 
immediately preceding discussion. In 
studies of concordance among sib- 
lings, each and every sibling in all 
families comprising the starting cases 
of the sample must be investigated. 
The same principle holds true for 
other blood relationships as well. It 
is not enough to report only known 
cases or readily available relatives. 
Of course, such a sampling ideal is in 
practice extremely’ difficult to 
achieve, for apparent reasons. 

This leads us to the second point, 
viz., that research on twins has an 
advantage over familial studies in 
that the sampling ideal just indicated 
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can be more readily approached, 
once one has defined his sample of 
starting cases. This is true because: 
(a) Twins are the same age. One does 
not have the problems of cases cover- 
ing two or more generations where 
the antecedent generation has been 
excessively decimated by deaths, or 
the subsequent generation has not 
yet reached or lived through an ap- 
preciable part of the morbidity risk 
period. (6) Twins are more readily 
located. This is true because twins 
are less likely than other siblings or 
relatives to be separated (which can 
of course be a disadvantage from 
another point of view), and are more 
likely to keep in touch or know of 
each other’s whereabouts if they are 
separated. (c) The number and type 
of relatives to be ascertained are 
more clearly specified than in the 
case of some other familial groupings. 
An individual may have many sibs 
or none at all, but a twin always has 


just one cotwin, and only that par- 


ticular person needs to be found 
and studied. Moreover, from a gene- 
tic standpoint, the same paternity in 
the case of twins is always assured, 
but no similar guarantee obtains in 
the case of other siblings. (d) The 
representativeness of the sample is 
more easily verified since the distribu- 
tion of twins in the general popula- 
tion according to sex and zygosity is 
known and can serve as a standard 
against which the sample can be 
compared. 


CONCORDANCE BY SEX FOR PRIMARY 
AND COLLATERAL FAMILIAL 
RELATIONSHIPS 


In pursuing our inquiry further we 
may ask two questions of theoretical 
relevance: (a) Does the pattern of 
familial concordance by sex obtain 
only with the primary family group, 
i.e., parent, child, and sibling, or does 
it also extend to aunts, uncles, and 
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cousins? If genetic factors are at 
work, the extension to avuncular 
relationships should be found, but to 
a lesser extent. If not, hypotheses 
emphasizing intrafamilial influences 
of a psychological nature would be 
enhanced. (6) Does the pattern of 
familial concordance by sex obtain 
for mental illness generally, or is it 
specific to those disorders called 
schizophrenic? 

To deal with these questions, we 
have recourse to large scale studies 
by Mott (1910), Myerson (1925), 
and Penrose (1945). Mott began his 
studies shortly after the turn of the 
century when he established an ‘‘He- 
redity Index,” a card-filing system in 
which names, diagnoses, dates of ad- 
mission, and discharge, etc. were re- 
corded for all persons admitted to 
one of the London County Council 
Mental Hospitals who were known 
to have or to have had any other rela- 
tive as a patient in one of these hos- 


pitals. The files were continued long 
after Mott’s original article in 1910, 


and Myerson (1925) and Slater 
(1953b) have reported more recent 
compilations of the filed data, these 
larger figures being cited here in- 
stead of Mott’s original figures. 
Myerson (1925) used the records of 
the ‘Taunton State Hospital in 
Massachusetts to obtain a list of all 
pairs of relatives who had been ad- 
mitted to that institution. For both 
the Mott and Myerson studies, the 
figures reported here include all such 
pairs, no attempt being made to 
differentiate individuals diagnos- 
tically. They are simply called men- 
tally ill to a degree warranting certi- 
fication. 

Penrose’s unpublished study has 
the largest accumulation of pairs of 
relatives. His survey was based on 
records of patients admitted to all 
Ontario Hospitals. Because his Ns 
were sufficiently large, and because 
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he separated out two groups of dis- 
orders which he called schizophrenic 
and affective, the figures for the two 
groups are listed separately.?, Under 
the schizophrenic disorders were in- 
cluded all patients who had been di- 
agnosed: schizophrenic—the simple, 
paranoid, catatonic, or hebephrenic 
subtypes; involutional paranoid or 
senile paranoid; dementia praecox; 
schizophrenic defective; arterio- 
sclerotic paranoid; and alcoholic par- 
anoid. Under the affective disorders 
were included all patients who had 
been diagnosed : manic; manic depres- 
sive; reactive depressive or psycho- 
neurotic depressive; involutional de- 
pressive or melancholic; involutional 
psychosis; manic manic 
psychosis with mental defect; senile 
manic or depressive; arteriosclerotic 
manic or depressive. Clearly, these 
groupings will not satisfy everyone, 
perhaps not even many. The inclu- 
sion of known ‘“‘organic’’ cases is 
questionable and distinctions such 
as between “‘involutional paranoid” 
and “involutional psychosis’’ may 
not be easy to make. Nevertheless, 
the attempt to relate cases in which a 
cognitive disorder predominates and 
cases in which an affective or mood 
disorder predominates is reasonable 
enough and is preferable to a labeling 
anarchy which large 
number of idiosyncratic 
categories. The main relevant data 
of the three studies are brought to- 
gether in Table 8. 

With respect to the parent-child 
relationship, it can be seen in Table 8 
that there were consistently more 
mothers (601) than fathers (455) 
and more mother-daughter (339) 
than father-son (209) pairs. This was 
true for the broad classification 


alcoholic; 


generates a 
discrete, 


? Omitted here are all those cases diagnosed 
primarily as: arteriosclerotic, senile, paretic, 
other organic, epileptic, defective, and Hunt- 
ington’s chorea. 
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TABLE 8 
Tue SEXUAL DISTRIBUTION OF PalRS OF RELATIVES CERTIFIED MENTALLY ILL 
(Three studies) 








Relationship Study 


Criteria 


Both 
male 


Both Male- Female- 
female female* male 





Parent and child | Mott (1910) 
Myerson (1925) 
Penrose (1945) 


Penrose (1945) 


Sibling Mott (1910) 
Myerson (1925) 
Penrose (1945) 


Penrose (1945) 


Mott (1910)4 
Myerson (1925) 
Penrose (1945) 
Penrose (1945) 


Uncle, aunt and 
nephew, niece 








Mentally ill 78 137 103 96 
Mentally ill 55 80 59 56 
Schizophrenic disorder 28 55 16 51 
Affective disorder 48 67 68 59 

Total 


209 339 246 262 


Mentally ill 140 211 
Mentally ill 57 80 
Schizophrenic disorder} 106 124 
Affective disorder 69 118 

Total 


372 533 


Mentally ill 67 73 
Mentally ill 41 42 
Schizophrenic disorder 39 39 
Affective disorder 26 45 

Total 


173 199 








* Male patient and female relative designates father-daughter and uncle-niece combinations, 
Female patient and male relative indicates mother-son and aunt-nephew combinations. 


© Figures cited by Myerson (1925). 
a Figures cited by Slater (1953). 


‘“‘mentally ill,’’ but was most striking 
in Penrose’s group of schizophrenic 


disorders. In this group there were 
106 mothers versus 44 fathers, and 
twice as many mother-daughter pairs 
as father-son pairs. These data may 
appear to have interesting psycho- 
logical implications, but can readily 
be explained simply on the basis that 
schizophrenic females are more likely 
to marry than schizophrenic males, 
and that married schizophrenic fe- 
males tend to have more children 
than their male counterparts (Essen- 
Miller, 1959). There was also a pre- 
ponderance of mentally ill daughters 
(585) as compared to sons (471). 
This is less readily explained, al- 
though the possibility that more sons 
than daughters may have emigrated 
from the areas covered by the re- 
spective studies may have been an 
important factor. The possibility 
also exists that daughters of mentally 
ill parents are more likely to be af- 


fected than sons. It is interesting to 
note that both the mentally ill 
mothers and fathers had more af- 
fected daughters than sons, the pro- 
portions of daughters to sons being 
56.6:43.4 in the case of the mothers 
and 54.0:46.0 in the case of the 
fathers. The slight difference is in 
favor of mother-daughter as com- 
pared to father-daughter pairs, which 
is in keeping with the previously 
noted findings of higher concordance 
among female than other familial 
pairs. 

With respect to Penrose’s schizo- 
phrenic group, the differences in con- 
cordance by sex are somewhat 
clearer. The certified mothers had 
slightly more affected daughters (55) 
than sons (51), whereas the fathers 
had more affected sons (28) than 
daughters (16). As a measure of the 
sex-concordance association, we may 
use the tetrachoric correlation which 
in this case is .25 or slightly less than 
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two times its standard error. A\l- 
though it does not reach statistical 
significance, this finding is again con- 
sistent with the sex-concordance hy- 
pothesis. 

The findings with respect to sib- 
lings are in agreement with the Schulz 
and Zehnder studies cited above. 
There were more sister than brother 
pairs (533:372) and more same- 
sexed (905) than opposite-sexed pairs 
(654). The differences are marked 
and highly significant, statistically. 
In Penrose’s schizophrenic group, the 
difference was less pronounced, but 
in the same direction. The question 
of sampling procedures which could 
be influenced by factors like differ- 
ential emigration rates in the sexes 
must be raised again. 

Such a factor would not account, 
however, for the fact that the number 
of brother pairs (372) exceeded more 
than one half the number of all 
brother-sister pairs (654). Chi square 
for this discrepancy from expectancy 
equals 5.78 which is significant at 
the .02 level. Thus again, we have a 
fairly firm finding of higher con- 
cordance in same-sexed as against 
opposite-sexed pairs of relatives. It 
should be noted, however, that most 
of the discrepancy occurred in the 
Mott series and not at all in Pen- 
rose’s group of affective disorders. 
Failure to replicate the main finding 
in the latter group may simply re- 
flect the fact that women generally 
are more susceptible to affective dis- 
orders than are men (Dayton, 1940; 
Landis & Page, 1938; Larsson & 
Sjégren, 1954; Malzberg, 1935). 

When we examine the avuncular 
groups, we find that whereas among 
primary familial groups (parent- 
child and sibling) there were 872 fe- 
male pairs as against 581 male pairs, 
there were only 199 aunt-niece pairs 
as against 173 uncle-nephew pairs. 
This latter difference is not statis- 
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tically significant from a 1:1 ratio. 
Moreover, almost all the difference 
occurred in Penrose’s group of af- 
fective disorders, a finding which is 
again probably reflecting the higher 
incidence of these disorders in fe- 
males than males. 

Comparing same-sexed with op- 
posite-sexed avuncular pairs, we find 
a significantly higher frequency of 
same-sexed pairs in the Mott study 
but not in any of the other three 
groups.’ I have no idea why Mott’s 
data deviate from the others in this 
respect. With respect to Penrose’s 
group of schizophrenic disorders, we 
find exactly the same number of 
aunt-niece as uncle-nephew pairs 
(39:39), and virtually the same 
number of same-sexed (78) as op- 
posite-sexed (77) pairs. Thus, the 
higher concordance of female than 
male pairs and of same-sexed than 
opposite-sexed pairs seems to obtain 
in the primary family groups but 
most of the evidence suggests that 
it no longer occurs, or occurs only 
to a very slight degree, when the 
familial relationships are one step re- 
moved. 

One could only wish that the sam- 
pling in these studies had been of a 
higher order so that one could have 
greater confidence in drawing con- 
clusions on a point of such high 
theoretical interest. Fortunately, one 
such study exists and we may use it 
as a check on the major findings and 
inferences just discussed. 

Penrose (1942) collected what was 
virtually a consecutive series of 500 
male and 500 female patients who 


3 It is worth noting that Slater (1953b) used 
the avuncular series in Mott's data to test the 
hypothesis of sex-linked recessives in mental 
illness and failed to find anything more than 
an infirm suggestion of a linkage between 
mental deficiency and paranoid  schiz- 
ophrenia. The Myerson (1925) and Penrose 
(1945) data are not encouraging with respect 
to finding any sex-linked recessives. 
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had been certified at the Ontario 
Hospital, London. The known rela- 
tives of each case were investigated 
and classified according to whether 
or not they had suffered from mental 
illness. Thus, this was more than an 
attempt to obtain pairs from hospital 
records only. Unfortunately, the 
classification approached complete- 
ness only in the case of parents, chil- 
dren, and sibs of the 1,000 starting 
cases, complete ascertainment of 
grandparents, uncles, aunts, nephews, 
nieces, and cousins being more diffi- 
cult to achieve. Thus, although a 
substantial improvement, sampling 
still fell considerably short of ideal. 
Relatives were divided into three 
groups: (a) Those who had been certi- 
fied (excluding cases of mental de- 
fect or epilepsy without psychosis), 
(6) those who showed signs of psy- 
chosis but who had not been certified, 
(c) a conglomerate group designated 
‘“‘psychopathic’’ which we shall not 
include in our discussion.‘ 

Rather than present the figures 
separately for each kind of familial 
relationship, I have arranged the 
pairs by sex according to whether the 
relationships were primary or sec- 
ondary. Certified and uncertified 
psychotic relatives were combined. 
In this way, we avoid too small Ns 
and yet address ourselves to our main 
hypotheses. The relevant data have 
been compiled as in Table 9. 

It can be seen in Table 9 that there 
were slightly more female than male 
pairs in both the primary and sec- 
ondary groups. The difference be- 


‘In this group were relatives described as 
nervous, excitable, hysterical, highly strung, 
neurasthenic, neurotic, hypochondriacal, de- 
pressive temperament, quarrelsome, bad 
tempered, unbalanced, peculiar, queer, eccen- 
tric, manneristic, bigamous, deserted family, 
in jail, alcoholic, or drug addict. Such a 
“group” clearly is not relevant to our inquiry, 
even if we disregard questions about the relia- 
bility of such descriptions. 
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TABLE 9 


CONCORDANCE BY SEX WITH RESPECT TO 
PSYCHOSIS IN PRIMARY OR SECONDARY 
FAMILIAL RELATIONSHIPS 


} 
Primary 
| relationships* 


Secondary 
relationships 


Relative ait 
Patient 


Patient 
| Male | Female 


Male | Female 


66 


Female | §9 | 


93 | 57 | 
Male | 84 2 || @ 


Tet= .01 


Note.—Data from Penrose, 1942. 

® Includes parents, children, and siblings. 

> Includes grandparents, uncles, aunts, nephews, 
nieces, and cousins. 


tween the groups in this respect 
found in Table 8 no longer obtained. 
The findings of Table 8 with respect 
to same- and opposite-sexed pairs 
however receive strong support in 
Table 9. Among primary familial 
groupings, there were 177 same-sexed 
as against 111 opposite-sexed pairs. 
The tetrachoric correlation equals 
.35, which is 3.77 times its standard 
error, p<.01. Among secondary 
familial groupings there were 121 
same-sexed as against 119 opposite- 
sexed pairs. The tetrachoric correla- 
tion equals .01, which is only about 
one-tenth its standard error, and of 
course not significantly different from 
zero. 

From a genetic point of view, mem- 
bers of pairs in the primary group 
share one-half of a common heredity, 
whereas those in the secondary group- 
ings share one-fourth (uncles, aunts, 
nephews, nieces, and grandparents) 
or one-eighth (cousins) of a common 
heredity. These values indicate the 
theoretical expectancies regarding the 
relative proportions of explained vari- 
ance in the association between sex 
and concordance among different 
familial groupings, if genetic factors 
are solely accountable. Since there 
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appears to be virtually no association 
at all between sex and concordance in 
the secondary group, one is left to 
conclude that factors other than 
genetic ones are involved in the 
association found in the primary 
familial groupings. It is assumed that 
a more complete ascertainment of 
relatives in the secondary group 
would not change the proportions 
shown in Table 9. This assumption 
seems tenable, since factors such as 
differential sex emigration should 
apply in all degrees of familial rela- 
tionship and should be manifest in 
even a partial sample that is rep- 
resentative, but it surely warrants 
further investigation. 

If other than genetic factors are im- 
plicated in sex differences in concor- 
dance rates, one might expect these 
differences to be accentuated in 
mental disorders defined primarily as 
the sharing by two people of char- 
acteristics of the illness in which 


heredity is probably playing less of a 


role. I refer to what has been known 
historically as ‘‘disorders of associa- 
tion.”’ Diagnostically, patients with 
such disturbances would often be 
subsumed under paranoid schizo- 
phrenia, but commonly too under 
hypochondriasis, these disorders not 
always being easy to differentiate. 
Since 1873, the term “folie 4 deux”’ 
has been applied most often to such 
cases (Laségue & Fabret, 1873). A 
scholarly review of this subject has 
been written in an unpublished 
doctoral dissertation by Greenberg 
(1961) submitted to the University 
of Sydney. 

A predominance of female over 
male pairs with folie 4 deux has been 
found by a number of investigators 
who compiled series of cases, mostly 
from reports in the literature (Gral- 
nick, 1942, Kréner, 1891; Marandon, 
1894; Wollenberg, 1889). Since we 
do not know the factors making for 
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some cases being reported and others 
not, we are unable to estimate the 
role of possible sampling bias in these 
reports. Recently, however, Green- 
berg’s (1961) study has become 
available and it is the first systemati- 
cally obtained sample of such cases, 
as far as I know. He states: 


From the Board of Control's central registry 
of admissions to mental! hospitals in England 
and Wales, instances of the more or less si- 
multaneous admission of two or more individ- 
uals, related either by blood or marriage, 
were noted for a period of five years. These, 
with the addition of several cases from the 
out-patient clinics of Guy's and St. Barthol- 
omew’s Hospitals, made up a total of 114 
cases, involving 234 individuals, each of which 
was then investigated individually. A numbex 
of cases irrelevant to this investigation were 
then exlcuded: those involving aged sibs liv- 
ing together, whose increasing dementia had 
precipitated their admission to hospital; 
related individuals developing psychoses 
apparently quite independently of one an- 
other; and several where the available clinical 
records were such as to preclude adequate 
appraisal. ... There remained 60 instances of 
folie 4 deux, a trois and 4 quatre, involving 124 
individuals, and these were submitted to de- 
tailed investigation. .. .The positive criteria 
of selection were that the subjects should show 
very similar or identical clinical pictures; that 
they wholly or partially shared the same delu- 
sions; that there was presumptive evidence 
from the history that the development of these 
states was in some way interrelated. 


The criteria for inclusion of cases 
and the 
relatively and _ straightfor- 
ward. The sample turned up 13 
mother-daughter pairs as against 2 
father-son pairs, and 18 pairs of 
sisters as against 3 pairs of brothers. 
There were 6 mother-son pairs, no 
father-daughter pairs, and 3 brother- 
sister pairs. Thus, although there was 
a preponderance of same-sexed as 
against opposite-sexed pairs, the dif- 
ference was accounted for primarily 
by the relatively high frequency of 
female pairs. Although one could 
raise the question of differential 
fertility rates as a possible explana- 


sampling procedures are 


specific 
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tion of the higher number of mothers 
than fathers, nevertheless the ratio 
of mother-daughter to father-son 
pairs was almost identical with the 
ratio of sister to brother pairs, sug- 
gesting that the sex difference here 
may have been relatively independent 
of fertility factors. This suggestion 
is supported by the fact that among 
all parent-child combinations, same- 
sexed pairs occurred 2.5 times as 
often as opposite-sexed pairs. The 
high frequency of sister pairs, plus the 
concurrent finding of equal numbers 
of brother-sister and brother pairs 
(the numbers of course are small), 
pose a constellation of frequencies 
which strain any attempt to providea 
simple genetic explanation. 


DISCUSSION 


Although relevant to a specific 
theoretical issue, the material covered 
in this analysis is quite heterogene- 
ous, especially with respect to 


methods of sampling and diagnosis. 
In the main, the data point up con- 
cordance rates with respect to schizo- 
phrenia which are higher for female 
than maie pairs and for same-sexed 
than opposite-sexed pairs of relatives 
in primary family groups but not in 


familial further re- 
moved. 

There are three possible ways of 
trying to account for such findings, 
and each warrants separate discus- 
sion: (a) the findings may be arti- 
facts produced by vagaries of sam- 
pling, (6) the findings may be valid and 
explainable on a genetic basis, (c) 
the findings may be valid and ex- 
plainable on a psychological basis. 

1. In presenting the material, | 
have called attention to ways in 
which sampling procedures alone 
could have led to the findings in some 
studies. Factors such as differential 
migration of the sexes, incomplete- 
ness of ascertainment, severity of 


relationships 
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illness or age of the subjects in the 
samples might have been contribu- 
tory in various ways. Can such fac- 
tors possibly account for all the find- 
ings? 

Differential migration rates must 
be considered as a serious and rele- 
vant source of sampling bias when 
pairs of relatives are ascertained only 
from records of hospitals in a fairly 
circumscribed area. The findings in 
the studies of Zehnder (1941), Schulz 
(1932), Mott (1910), Myerson (1925), 
Penrose (1945), and possibly Green- 
berg (1961) could conceivably have 
arisen from such a bias. However, in 
the studies of twins such a bias could 
not have occurred since the sampling 
involved single index cases rather 
than pairs of relatives, and since 
each ascertainable cotwin of every 
index case was included in the evalua- 
tion, even if the cotwin had emi- 
grated or died after reaching the age 
of morbidity risk. Since the con- 
cordance rates were found to be 
higher for female than male MZ pairs 
in the five major studies, and for 
female than male DZ pairs in Slater’s 
study, and for same-sexed than 
opposite-sexed pairs of DZ twins in 
the studies of Rosanoff et al., Slater, 
and Kallmann, we have evidence 
that factors other than differential 
migration rates must have been in- 
volved. Also, higher migration rates 
for males than females would not by 
themselves have accounted for a 
higher incidence of brother pairs 
relative to brother-sister pairs in the 
Mott, Myerson, and Penrose (1945) 
studies combined. One would have to 
add the qualification that brothers of 
females might have been more likely 
to migrate than brothers of males. 
Of course the brother pairs must have 
had sisters in their families and 
opposite-sexed pairs must have had 
other brothers, so that the point 
would have to be qualified further 
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with respect to the incidence of the 
straining the 


sex of other sibs, 
hypothesis even more. 

Moreover, we would expect that as 
the geographic area encompassed in 
the sampling increased, migration 
effects would have been propor- 
tionately reduced. Thus, sex-con- 
cordance ratios should have been 
less in a study like Greenberg’s which 
covered all of England and Wales 
than in a study like Zehnder’s or 
Myerson’s which sampled from one 
hospital, or than in a study like 
Mott’s or Penrose’s, which sampled 
from one county or province. Act- 
tually, the sex-concordance ratios 
were highest in Greenberg’s study, 
and there was great similarity be- 
tween the ratios in Myerson’s study 
and those in the Mott and Penrose 
studies. 

Lastly, in a more fully ascertained 
sample, Penrose (1942) found higher 
concordance for same-sexed as 
against opposite-sexed pairs of sibs. 
Thus, although they may well have 
been contributory, differential migra- 
tion rates alone could not have ac- 
counted for the array of sex-con- 
cordance findings presented here. 

Incompleteness of sampling does 
not per se constitute a serious objec- 
tion to the kinds of data presented 
unless one can point toward specific 
biases which are likely to occur on a 
selective basis when some propor- 
tion of prospective cases is missed. 
One such bias, probably the most 
important, would be differential mi- 
gration rates in the sexes, but we have 
just seen that this bias alone could 
not account for many of the findings 
presented. A second possible bias, 
selective reporting of cases by sex, is 
somewhat obviated in studies which 
ascertained cases from _ hospital 
records. We might, however, con- 
sider the possibility in these studies 
that relatives with the same family 
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name would more likely have been 
found than those with different fam- 
ily names. If so, more brother pairs 
should have been found than sister 
pairs since many sisters had probably 
married, attenuating the seriousness 
of such a possibility. However, it if is 
true that concordance rates are really 
higher for female than male pairs 
among collateral relatives as well, 
then a disproportionate loss of aunt- 
niece pairs through changes of family 
name might have occurred in studies 
which ascertained cases from hospital 
records (Mott, 1910; Myerson, 1925; 
Penrose, 1945). Such loss could 
conceivably account for the fact that 
no sex difference was found among 
avuncular pairs in these studies. 
However, in the Penrose, 1942 study 
which ascertained cases through per- 
sonal inquiry among families, such 
loss is much less likely and we still 
find that the higher concordance 
rate among same-sexed pairs in the 
primary family groups no longer ob- 
tains in the collateral groups. The 
latter finding would have to assume a 
disproportionate loss among same- 
sexed pairs as against opposite-sexed 
avuncular pairs if the difference 
between primary and secondary fam- 
ily groups is to be attributed to sam- 
pling bias, and this assumption seems 
improbable. 

It has been shown with respect to 
MZ twins that when the severity of 
the illness in the index case is great, 
concordance is likely to be consider- 
ably higher than if the index case is 
only mildly ill (Rosenthal, 1961). 
Thus, one may wonder if a similar 
relationship obtains among - other 
relatives and if such a factor could 
explain the sex-concordances re- 
ported. However, there seems to be 
no reason to think that, for example 
in the twin studies, the males would 
have been less severely ill than the 
females. It has been pointed out ina 
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few studies that males are actually 
more likely to get out of the hospital 
than females® (Lengyel, 1941; Rosen- 
thal, 1961; Sternberg, 1948), so that 
even in the twin samples which were 
overloaded with chronic, severe cases, 
most of whom were women, the males 
ascertained as index cases would 
probably have been at least as 
severely ill as the female index cases. 
Thus, on this ground alone there 
would be no reason to expect higher 
concordance in the female than the 
male twin pairs, and it seems reason- 
able that this argument would extend 
to other pairs of relatives as well. 

One might conjecture that female 
patients are more likely than males to 
be communicative about themselves 
and their relatives, making for a more 
complete ascertainment of female 
than male cases. Such a conjecture is 
plausible enough, but it would not 
explain those findings where there 
was a higher frequency of brother 
pairs than brother-sister pairs. All 
told, it seems unlikely that sampling 
biases alone could have accounted 
for the main findings in this paper. 

2. Both Penrose (1942) and Slater 
(1944, 1953b) concerned themselves 
primarily with a possible genetic 
explanation of higher concordance for 
psychiatric disorder among same- 
sexed than opposite-sexed pairs of 
relatives. According to the data 
presented here, especially those re- 
lating to schizophrenics, such an ex- 
planation needs to be expanded to 
account also for the fact that this 
relationship no longer obtains among 
collateral pairs of relatives, that con- 


5 This statement is based on data obtained 
during the first half of this century. Since 
the advent of tranquilizing drugs, differences 
between the sexes in this regard may no 
longer occur, but all the studies reviewed here 
were done before these drugs were in wide- 
spread use. 
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cordance rates are higher for female 
than male pairs, and that the fre- 
quency of admissions for schizphrenia 
is as high for males as for females, and 
maybe higher. 

Penrose’s theory of two autosomal 
sex-augmenting genes could be modi- 
fied to account for higher concord- 
ance in female than male pairs. It 
would only be necessary to postulate 
that the effect of the male-augment- 
ing Gene B is greater (has higher ex- 
pressivity) than that of the female- 
augmenting Gene A. Thus, sisters 
affected with Gene B would both be 
more likely to manifest aspects of 
sexual inversion than both brothers 
affected with Gene A, and the sisters 
would therefore be more highly pre- 
disposed to schizophrenia. However, 
even with this modification, one 
should still find these factors operat- 
ing in collateral as well as primary 
family groups and one should also 
expect to find a greater number of 


females admitted for schizophrenia 


than males. Since the best evidence 
available goes contrary to these ex- 
pectations, the expanded theory still 
would not be adequate. 

The same point would apply to 
Slater’s theory (1944) which was 
drawn along lines similar to Pen- 
rose’s, but which objected to the 
specific ‘‘genetic’’ characteristics 
assigned a key etiological role by 
Penrose. Slater preferred to talk 
more broadly in terms of genetic, 
biochemical, and environmental fac- 
tors which influenced expressivity 
and penetrance of the inherited trait 
(schizophrenia), supposing only that 
some of these were more similar for 
same than opposite-sexed pairs. 
Some of these factors could also have 
been said to have higher expressivity 
in females than males, thus account- 
ing for the higher concordance found 
in female pairs. However, further 
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hypothetical qualifications would be 
necessary to try to account for similar 
or lower admission rates for females 
than males and the virtual disappear- 
ance of the sex-concordance ratios 
among collateral relatives. These 
qualifications, even if conceptually 
possible, would strain the hypothesis 
considerably, making it unusually 
complex and unwieldy. Moreover, 
the theory would eventually have to 
specify what these factors might be 
and how they might facilitate or 
inhibit manifestation. Until it did, 
the theory could not be evaluated 
further. 

3. Psychological theories to ac- 
count for the above findings would 
not at all be hard to come by, and 
the task really becomes one of ex- 
cluding from consideration loosely 
conceptualized formulations which 
seem to be able to account for findings 
in one direction as well as another. 
Even if we limit ourselves to sound 
experimental and statistical studies in 
the psychological literature, we find 
such an abundance of heterogeneous 
studies of sex differences in personal- 
ity and behavior that a thorough 
analysis of them could not be at- 
tempted here. The findings are not 
always consistent, and methods, 
measures, source and age of subjects, 
and research goals differ widely 
among studies. Therefore, I shall 
selectively present a few studies 
which appear relevant to our in- 
quiry, briefly examining their ex- 
planatory power with respect to our 
major findings. The studies chosen 
bear on two lines of thought, not 
necessarily incompatible: one is the 
‘“‘anxiety-generalization hypothesis” 
of schizophrenia and the other has 
been called ‘‘sex-role identification.” 
I do not present these as ‘‘theories”’ 
nor do I wish to convey the impres- 
sion that I espouse either one. I 
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present them to show that findings 
already exist in the psychological lit- 
erature which are at least consistent 
with the sex-concordance rates pre- 
sented above, to illustrate in two 
ways how findings obtained in study- 
ing the genetics of schizophrenia may 
be related to some studies of per- 
sonality traits, to point up some 
problems which such lines of thought 
are confronted with when they are 
considered as possible factors ac- 
counting for the sex-concordance 
rates reported, and to imply that 
such lines of thought are deserving of 
a fuller critical and theoretical exposi- 
tion than can be attempted here. 

If we assume that high levels of 
anxiety predispose to schizophrenia 
(Mednick, 1958), and that at pre- 
psychotic stages high anxiety levels 
are often manifested as ‘‘neurotic’”’ 


traits, then, based on the above data, 
we should expect the correlation of 
such traits to be higher for female 


than male pairs and higher for same- 
sexed than opposite-sexed pairs of 
relatives. Olson (1929) studied the 
“nervous habits’ of 201 pairs of 
siblings in elementary school. The 
correlations were .32, .16, and .09 
for sister, brother, and sister-brother 
pairs, respectively. Crook (1937) 
administered the Bernreuter Per- 
sonality Inventory to 503 pairs of 
college students and found correla- 
tions for ‘‘neuroticism’”’ of .35, .22, 
and .13 for sister, brother, and sister- 
brother pairs, respectively. Carter 
(1933) administered the same inven- 
tory to 117 pairs of twins. Correla- 
tions for neuroticism were .61, .32, 
and .18 for MZ, same-sexed DZ, and 
opposite-sexed DZ pairs, respectively. 
The higher correlation for same-sexed 
than opposite-sexed DZ twins is 
consistent with the findings presented 
earlier, but the higher correlation for 
MZ than same-sexed DZ twins sug- 
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gests that inherited factors may be 
importantly involved in  neuroti- 
cism. Eysenck and Prell (1951) main- 
tained this point of view, but a com- 
parison of their study with Carter's 
indicates that each was measuring 
quite different traits labeled neurot- 
icism by the respective authors. 
These correlations are consistent 
with the main findings of this article. 
Two studies of parent-child pairs 
regarding neuroticism on the Bern- 
reuter are not quite as consistent 
(Crook, 1937; Hoffeditz;- 1934). In 
both studies, mother-daughter pairs 
had higher correlations (.27 and .57) 
than father-son pairs (.06 and .05), 
but opposite-sexed parent-child pairs 
tended to have higher correlations 
(.23, .01, .23, .30) than father-son 
pairs. There is a suggestion that the 
ordering of most of these correlations 
could be accounted for in part by the 
assumption of a higher incidence of 
neurotic traits in females than males 
generally. There are a number of 
studies which support such an as- 
sumption (Castaneda, McCandless, 
& Palermo, 1956; Hattwick, 1937; 
Jersild & Holmes, 1935; Mathews, 
1923; Rosenblum & Callahan, 1958). 
However, if such factors have rele- 
vance for the found concordance rates 
by sex with respect to schizophrenia, 
they should also predict a higher in- 
cidence of schizophrenia in females 
than males, which is not the case, at 
least for hospital admissions. It may 
be possible that more females develop 
milder forms of the illness and are 
therefore less likely to be hospital- 
ized, accounting for the discrepancy 
from the prediction. Supporting this 
possibility is the well-established fact 
that females are generally admitted 
to hospitals later in life than males 
(Landis & Page, 1938; Larrson & 
Sjégren, 1954, Malzberg, 1935) and 
the finding of a much higher fre- 
quency of males than females with 
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the most severely disorganizing sub- 
type of the disorder, hebephrenia 
(Schulz, 1932). 

If, however, anxiety predisposes to 
schizophrenia, and females are, more 
anxious, and more males are hospital- 
ized for the disorder, at least earlier 
in life, then we seem to be confronted 
with an apparent contradiction. It 
may be that (speaking in Mednick’s 
terms) generalization of anxiety oc- 
curs more readily in males than fe- 
males, fostering illness-inducing anx- 
iety-generalization spirals more fre- 
quently in males. Some evidence for 
such a possibility may be noted in a 
study by Sontag (1947) who found 
that girls were physiologically more 
reactive to stress than boys, but re- 
covered more quickly. 

At least one additional line of 
thought needs to be presented in an 
accounting of the main sex-concord- 
ance findings in this paper. If the 
aforementioned data are valid in that 
the sex-concordance ratios found in 
primary family relationships do not 
obtain among collateral relatives, we 
are led to infer that some factors 
peculiar to the structure of nuclear 
family life are contributing to those 
ratios. What might these factors be? 
In a previous paper, | examined Jack- 
son’s (1960) “confusion of identity” 
hypothesis as one such possible factor 
factor and found it wanting (Rosen- 
thal, 1960). 

Psychologists have been paying 
increasing attention to a related con- 
cept borrowed from psychoanalysis, 
called “identification,” but they do 
not always agree on its definition, 
either in conceptual or operational 
terms. One aspect of the concept 
which has been studied in some detail 
has been called sex-role identification. 
To have explanatory power with re- 
spect to the sex-concordance ratios 
above, sex-role identification should 
be demonstrably greater among fe- 
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males than males, and among same- 
sexed than opposite-sexed pairs of 
family members. 

Because children of both sexes have 
most contact during early rearing 
with their mother, they first identify 
with her, but the male child soon 
shifts to a masculine identification 
(Lynn, 1959). By age 3, most chil- 
dren are able to make sex-role dis- 
tinctions and this knowledge in- 
creases with age (Brown, 1956, 1958; 
Fauls & Smith, 1956; Rabban, 1950; 
Sears, Maccoby, & Levin, 1957). The 
girl retains her feminine identifica- 
tion, which is apparently not weak- 
ened even though she is likely to go 
through a stage of developing a pref- 
erence for more masculine activities. 
Girls are in more direct contact with 
their mothers than boys are with 
their fathers, so that whereas the girl 
is more likely to identify with a 
specific feminine model, viz., mother, 
the boy tends to identify with the 
cultural stereotype of masculinity 
rather than directly with his father 
(Lynn, 1959; Stoke, 1950). Women 
tend to be more like their mothers 
than their fathers in areas of major 
interest, but the reverse is not true 
for men (Beier & Ratzeburg, 1953; 
Gray & Klaus, 1956). However, both 
men and women tend to perceive 
themselves as more like their same- 
sexed than their opposite-sexed parent 
(Beier & Ratzeburg, 1953; Crook, 
1937; Gray & Klaus, 1956; Sopchak, 
1952). Boys with older sisters tend to 
be substantially more feminine than 
bovs with younger sisters, but sisters 
with older brothers show only a slight 
increase in masculine traits as com- 
pared to girls with younger brothers 
(Brim, 1958; Brown, 1956; Koch, 
1955). In the main, these findings 
lend support to the hypothesis that 
identification with same-sexed family 
members is stronger in females than 
males. 
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Using an operational definition of 
identification which was then related 
to patterns of psychopathology, Sop- 
chak (1952) found that male subjects 
who had tendencies toward abnor- 
mality on the MMPI showed a lack 
of identification in varying degree 
with their fathers, their mothers, and 
“‘most people,’’ in that order, the 
lack of identification in each case 
being positively correlated with scores 
on the Schizophrenia scale. Female 
subjects who had tendencies toward 
abnormality showed positive but not 
significant correlations between 
identification with mother and all 
types of abnormal trends. These 
findings could be interpreted in ways 
which would make them consistent 
with the sex-concordance ratios re- 
ported above, suggesting further lines 
of research. 

Even though the bodies of data 
presented have suggestive value in an 
accounting of the sex-concordance 
ratios found in studies of schizo- 
phrenia, the role of genetic factors 
cannot be excluded. However, if the 
found sex-concordance ratios are 
valid, it seems reasonable to conclude 
that some psychological factors are 
influencing these ratios in good part. 
Such a conclusion would be in accord 
with a previously reported finding of 
a group of schizophrenic cases where 
the genetic contribution to etiology 
was either minimal or absent (Rosen- 
thal, 1959), and with the finding that 
hereditary factors were not account- 
ing for as much of the variance with 
respect to schizophrenia as some 
leading investigators had supposed 
(Rosenthal, 1960). 


SUMMARY 


The literature regarding concord- 
ance rates with respect to schizo- 
phrenia among relatives of both sexes 


is reviewed. These rates are gen- 
erally found to be higher for female 








420 


than male pairs and higher for same- 
sexed than opposite-sexed pairs of 
reatives in primary family groups, 
but not among collateral relatives. 
The possible role of sampling errors, 
genetic contributions, and psycho- 
logical factors in generating such 
sex-concordance ratios is examined. 
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It seems reasonable to infer that 
psychological factors are influencing 
the sex-concordance ratios. Two 
lines of thought in the psychological 
literature, ‘‘anxiety-generalization” 
and ‘“‘sex-role identification,” 
briefly discussed as such 
factors. 


are 
possible 
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THE PERCEPTION OF DEPTH THROUGH MOTION! 


MYRON L. BRAUNSTEIN? 
University of Michigan 


The classical cues to depth percep- 
tion, as outlined in almost every gen- 
eral psychology text, do not ade- 
quately handle a class of depth 
phenomena which has occasionally 
been described in the literature and 
has only recently been systematically 
studied. The common characteristic 
of these phenomena is this: A vis- 
ual pattern, which when stationary 
is reported to appear two-dimensional 
by most observers, is transformed in 
some manner, and upon viewing this 
transformation at least some ob- 
servers report seeing a form moving in 
other than the frontal plane, seeing a 
three-dimensional object in motion, 
or seeing a three-dimensional scene. 
In most cases, the cues of binocular 
disparity, convergence, relative size, 
interposition, linear perspective, 
aerial perspective, motion parallax, 
light and shade, and accommodation, 
as Classically defined, are ineffective, 
for the stimuli are abstract figures 
projected onto a flat surface. 

“‘Motion perspective,” J. J. Gib- 
son’s (1950) name for the perspective 
of change of position, as contrasted 
with the more familiar perspective of 
position, is closely related to these 
phenomena. As an object moves, the 
projections of its surface features and 
of its contours undergo certain reg- 


1 This paper is based on a chapter in a 
dissertation submitted to the Department of 
Psychology at the University of Michigan in 
partial fulfillment of the requirements for the 
PhD degree. 

* Now at Cornell Aeronautical Laboratory, 
Incorporated. 

The author is grateful to J. J. Gibson of 
Cornell University for his critical reading of a 
preliminary draft of this paper. 
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ular transformations. As the observer 
moves, the entire retinal image un- 
dergoes similar transformations. It is 
the transformation of the entire ret- 
inal image which Gibson refers to 
as motion perspective. The trans- 
formations which particular objects 
or patterns in motion undergo will 
also be termed motion perspective in 
this paper, for there does not appear 
to be any reason for a general distinc- 
tion between these aspects of depth 
perception, and no suitable label has 
been applied to transformations of 
parts of the visual field as a cue to 
depth. 

Almost none of the psychological 
studies carried out more than 10 
years ago, and few of the recent ones 
which illustrate motion perspective, 
were systematic attempts to study 
this aspect of depth perception. The 
earliest references to the part played 
by continuous transformations of 
the projections of objects in depth 
perception treat either the perception 
of motion in depth, or its partial 
failure, as an illusion. 


DerptH ILLUSIONS BASED ON MOTION 
The Windmill and Fan Iilusions 


Sinsteden (Boring, 1942, p. 270), 
Kenyon (1898), and Johnson (1927) 
reported discoveries of an illusion 
connected in the former case with a 
distant windmill and in the latter 
cases with a two-bladed electric fan. 
The blades of the windmill or fan 
were sufficiently distant from the ob- 
server to render most of the cues to 
their relative distance ineffective. As 
the blades moved, their direction of 
rotation was ambiguous and would 
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appear to reverse from time to time. 
Of interest here is Kenyon’s report 
that the blades could at times be seen 
as rotating or oscillating in three di- 
mensions, and at other times could be 
seen as expanding and contracting 
in two dimensions. Miles (1929, 
1931) has discussed this illusion and 
demonstrated it using what he calls a 
‘“‘kinephantoscope.”” This consists 
of a two-bladed fan rotating between 
a light source and a milk glass. 
Almost all observers at some time re- 
ported seeing the fan rotating. Rota- 
tion was seen in both directions. 
Most observers reported seeing the 
blades expanding and contracting in 
two dimensions at other times. In a 
second experiment, most observers 
reported being able to see the kind of 
motion called out by the experimenter 
within an allotted time. 


Lissajous Figures 


Persons working with oscilloscopes 
are usually familiar with another 
“‘illusion”’ of motion in depth. If two 


oscillators are connected to an 
oscilloscope, one to the horizontal and 
one to the vertical input, and are ad- 
justed to frequencies in simple 
numerical ratio, Lissajous patterns 
result. By slight mistuning of the 
frequencies, the patterns can be set 
into apparent motion. They may be 
perceived as rotating about a vertical 
or horizontal axis, depending upon 
which input receives the higher fre- 
quency. Speed of rotation varies with 
amount of mistuning. Complexity 
of the pattern is a function of the ratio 
of the frequencies used, 1:1 giving a 
circle, 2:1 a two-looped figure, etc. 
Direction of rotation and brightness 
of the pattern may also be readily 
varied. There is no ‘‘perspective”’ in 
Lissajous patterns and they con- 
sequently resemble three-dimensional 
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wire figures shown in parallel projec- 
tion. 

Rotating Lissajous patterns were 
introduced to the psychological litera- 
ture by Weber (1930), who used two 
tuning forks arranged at right angles 
with mirrors pasted to their tips, 
which reflected light onto a screen. 
Weber found that with the tuning 
forks set in nearly a 1:1 ratio, a 
circle would be seen alternately 
rotating about each of two perpen- 
dicular diagonal axes. He discussed 
attitudinal influences on whether or 
not the figure was seen as rotating in 
three dimensions, and upon the direc- 
tion of perceived rotation. 

Philip and Fisichelli (1945) used an 
oscilloscope and electronic oscillators 
to study parameters influencing the 
rate of reversal of apparent move- 
ment in Lissajous figures. Using fre- 
quencies in the ratios 4:1, 6:1, and 
8:1, they instructed observers to 
press a key when the direction of 
movement of the pattern seemed to 
reverse. Increase in complexity of 
the figures and increase in speed, toa 
lesser degree, were found to enhance 
the rate of apparent reversal. Wide 
individual differences were found in 
the number of reversals. As would be 
expected in the case of parallel pro- 
jections according to Gibson (1957), 
there was no significant overall 
preference for left or right direction of 
movement. 

Fisichelli (1946), using the same 
apparatus, investigated the effects of 
axis of rotation and height-width 
ratio of the pattern upon the rate of 
apparent reversal, finding more re- 
versals with a horizontal axis and a 
limited effect of height-width ratio. 
In both studies, observers reported 
brief interruptions of continuous 
rotary movement during observation 
of the figures. These were called 
“wiggle,” “‘flickering,” and ‘‘move- 
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ment two ways at once,” and indi- 
cated perception of a two-dimen- 
sional rather than a_ three-dimen- 
sional figure. From the point of view 
of the present paper, it is unfortunate 
that there has been no subsequent 
research on the factors determining 
whether motion is seen in two dimen- 
sions or three dimensions when vari- 
ous Lissajous patterns are displayed 
under controlled conditions. 


Stereokinetic Phenomena 


A class of depth “‘illusions,’’ to 
which Musatti (1924) has applied the 
term “stereokinetic,”’ appeared in the 
European literature several decades 
ago. A theoretical discussion of the 
effect, along with an _ extensive 
bibliography, may be found in a 
paper by Musatti (1931). The follow- 
ing description of the phenomena 
will be based on Metzger’s text 
(1953, Ch. 13). 

The effect is produced by rotating 
certain patterns about an axis parallel 
to the line of regard. The physical 
motion thus takes place in the frontal 
plane. However, these two-dimen- 
sional figures, which give little or no 
impression of depth when viewed at 
rest, may take on a three-dimen- 
sional appearance while being ro- 
tated. Several examples of this effect 
follow: 

If two ellipses are drawn such that 


a b 


Fic. 1. 
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the center of the smaller is between 
the center and edge of the larger, 
while the minor axes are on the same 
line, and the figure is rotated as de- 
scribed above, it may take on the 
appearance of a “lampshade,”’ rising 
from the plane on which the figures 
are drawn. If an ellipse is drawn with 
clockface markings and an arrow is 
drawn along its minor axis, rotation 
of the figure may cause the arrow to 
appear perpendicular to the plane of 
the clockface, pointing outward in the 
“third dimension” (Figure 1a). Fig- 
ures composed of interlocking ellip- 
tical rings may yield perceptions of 
solid objects. Metzger (1953, pp. 
334-335) presents four drawings 


which, when rotated, may appear to 
be a vase, a wineglass, an hourglass, 
and a double basin, respectively. 

A recent empirical study of the 
stereokinetic effect was reported by 
Wallach, Weisz, and Adams (1956). 
In one experiment, a white ellipse 


pasted onto a black cardboard disk 
was rotated at 20 rpm. After 30 
seconds of binocular observation and 
10 to 60 seconds of monocular ob- 
servation, if an observer had failed to 
report observing a circular disk 
rolling around on its edge, this 
possibility was suggested to him. Of 
47 observers, 6 reported seeing the 
tilted disk during monocular observa- 
tion, without suggestion; 34 reported 


C 


Examples of patterns which may appear three-dimensional when rotated. (Shown 


are: Metzger’s “‘self-willed arrow’’; the overlapping rings of Wallach, Weisz, and Adams; 


and Fischer's offset circles.) 
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seeing it only after suggestion; 7 re- 
ported never seeing it. Most ob- 
servers were able to describe aspects 
of the tilting disk not suggested by 
the experimenter such as_ slant 
changes, indicating that they were 
not merely repeating the experi- 
menter’s suggestions. 

A similar procedure was used in a 
second experiment, in which the 
stimulus was a pattern of six over- 
lapping rings (Figure 1b). None of 12 
naive observers reported seeing the 
figure as three-dimensional when 
stationary, but 10 reported it as such 
during binocular observation of its 
rotation, and one more during monoc- 
ular observation, leaving only one 
who required suggestion. The three- 
dimensional form was described as 
resembling a bedspring. 

Fischer (1956) systematically 
studied the effects of several factors 
on the stereokinetic effect. These 
were off-set (the extent to which two 
circles overlapped, varying from con- 
centric to tangential), placement (of 
the circles on the turntable), monoc- 
ular versus binocular observation, 
equality versus inequality of circle 
size, and direction of rotation of the 
turntable (Figure 1c). Eleven stimu- 
lus figures were used. The observer 
was asked to estimate the ‘‘amount”’ 
of depth perceived by adjusting a 
sliding gauge. 

For a depth effect to be obtained, it 
was necessary and sufficient that the 
distance between the centers of the 
two circles be greater than zero but 
less than 7;+ 7.2, or 2r in the case of 
equal circles. Within these limits, 
amount of depth judged increased 
with increasing distance between the 
centers. A greater amount of depth 
was judged with monocular viewing. 

A differential size cue was neither 
necessary nor sufficient to produce a 
depth effect. Concentric circles were 
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generally reported as flat when ro- 
tated, despite size differences, while 
off-set circles of the same size elicited 
reports of perceived depth. 

Fischer also eliminates motion 
parallax as a sufficient cue on the 
basis of a consideration of the differ- 
ential rates of physical motion of the 
circles drawn on the rotating disk. 
Tangential circles had differential 
rates of motion but did not yield re- 
ports of depth perception, although 
when off-set was also present, the 
more rapidly moving circle tended to 
be judged as the nearer. 

In addition to studies treating as- 
pects of motion perspective as illu- 
sions, there has been research con- 
cerned primarily with other aspects 
of perception which has turned up 
instances of perceived depth in the 
absence of ether cues. 


Metzger's Research 


Metzger (1934a) devised apparatus 


sé 


for studying ‘phenomenal identity,” 
consisting of a turntable placed be- 
tween a light source and a translucent 
screen. Rods could be placed in vari- 
ous positions on the turntable, stand- 
ing upright. The subject could not see 
either end of the rods through the 
screen, and of course could not see 
the turntable. The distal stimulus 
was a number of parallel shadows of 
vertical lines, moving back and forth 
across the screen as the turntable re- 
volved. Metzger’s purpose was to 
observe the changes in perceived 
identity which occurred when two 
shadows crossed and separated again. 
But he found an unexpected effect 
which he followed up in a subsequent 
study (1934b). Instead of shadows of 
lines moving back and forth in the 
plane of the screen, the subjects fre- 
quently reported perceptions of rods 
rotating in the third dimension. 
Imaginary lines connecting the pro- 
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jections of these rods would change 
only in length as the turntable ro- 
tated, rather than in both length and 
direction, as Wallach and O’Connell 
(1953) considered necessary for a 
kinetic depth effect. Metzger’s 
stimuli gave rise to reports of vary- 
ing perceptions, readily influenced by 
suggestion, and were similar in effect 
to the “windmill” and “‘fan”’ illusions, 
discussed earlier. 


Johansson's Research 


Johansson (1950) employed ap- 
paratus which permitted movable 
objects to be projected onto a trans- 
lucent screen. The objects were 
drawn or pasted onto celluloid disks. 
As many as six disks could be em- 
ployed simultaneously, and _ six 
mechanical systems independently 
controlled their motion. The disks 
could be moved in horizontal, ver- 
tical, sloping, circular, or elliptical 
paths, in a frontal plane. Shadows of 


the objects in motion were viewed by 
the observers through the translucent 
screen. 

In a number of Johansson’s experi- 
ments, the observers reported per- 
ception of movement in three dimen- 
sions, although the distal stimulus 


was always two-dimensional. A 
three-dimensional perception was 
generally reported as secondary to a 
more easily elicited two-dimensional 
perception. 

In one experiment a row of six 
bright spots on a homogeneous dark 
field was displayed. The two end 
spots were stationary. The two spots 
located one position from either end 
were simulataneously moved up and 
down within the same time interval, 
but with an amplitude approxi- 
mately double that of the outer two. 
The reported perceptions were of 
spots on a harmonically swinging 
line. Several observers also reported 
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perceptions of the spots as being 
knots on a rope swinging in the third 
dimension. 

In another experiment, two dark 
spots on a homogeneously bright 
field were moved along perpendicular 
paths such that they met (and fused) 
at the center of their respective 
paths. There were several varieties of 
two-dimensional perceptions _ re- 
ported, but these did not include the 
“‘veridical” perception of two spots 
moving along perpendicular paths. 
Instead, the spots were reported as 
appearing to be two spots moving 
along a common sloping path, at 
times penetrating or passing through 
one another, and at times colliding 
and recoiling. The three-dimensional 
perception reported was that of the 
spots forming the terminals of a rod 
perpendicular to the frontal plane 
which ascends and descends an in- 
clined axis on the frontal plane. 

Additional reports of three-dimen- 
sional perceptions occurred in some of 
the many other experiments reported 
by Johansson. 


MotTION AS A CUE IN DEPTH 
PERCEPTION 


Research designed for the systemat- 
ic study of motion perspective, or of 
the ‘‘kinetic depth effect’’ as the 
aspect of motion perspective under 
consideration has been called by 
Wallach and O’Connell (1953), has 
come primarily from three sources: 
J. J. Gibson and his associates at 
Cornell University, Wallach and his 
associates at Swarthmore, and B. F. 
Green and his group at the Mas- 
sachusetts Institute of Technology, 
Lincoln Laboratory. 


The Kinetic Depth Effect 


In a series of experiments, Wallach 
and O’Connell (1953) investigated 
the conditions leading to what they 
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termed the kinetic depth effect. The 
effect is said to occur when a form 
placed between a point light source 


and a translucent screen casts a. 


shadow which appears two-dimen- 
sional when the form is at rest, but 
casts shadows yielding perceptions of 
a three-dimensional form, when the 
object is rotated. 

Solid forms and wire outline figures 
in various shapes were rotated about a 
vertical axis. When such transforma- 
tions resulted in shadows having con- 
tours which simultaneously changed 
in both length and direction, most ob- 
servers reported three-dimensional 
perceptions. The light source was 
sufficiently distant to result in nearly 
parallel projections. In some cases, as 
would be expected, perceived direc- 
tion of rotation appeared to be a 
chance matter, and spontaneous re- 
versals of direction occurred. 

In one experiment, three rods 
meeting at a point at angles of 110 de- 
grees were rotated. If the ends of the 
rods were visible, three-dimensional 
perceptions were elicited, but if the 
ends were concealed, two-dimensional 
perceptions were reported, indicating 
that changes in direction of contours 
(i.e., sizes of angles) without changes 
in length of contours is insufficient 
for the kinetic depth effect. 

In another experiment, a T shaped 
wire figure and a wire equilateral 
triangle were rotated. The former 
was reported to appear two-dimen- 
sional while the latter was described 
as three-dimensional, indicating that 
changes in length of contours without 
changes in direction are insufficient 
for the kinetic depth effect. Other 
experiments demonstrated that if 
shadows consisted of several in- 
variant elements, a kinetic depth 
effect was elicited if imaginary lines 
connecting the elements changed in 
both length and direction. 
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In a subsequent study of the 
kinetic depth effect (Wallach, O’Con- 
nell, & Neisser, 1953), wire figures 
were placed between the light source 
and screen which yielded shadows 
when at rest which were reported to 
appear two-dimensional by a major- 
ity of the observers. When a figure 
was turned back and forth, most ob- 
servers reported perceiving shadows 
of three-dimensional forms. After 
intervals of from several minutes toa 
week, most observers when presented 
again with stationary shadows, now 
reported perceiving them as three- 
dimensional, without rotation. Re- 
versals of the Necker-cube type 
occurred after prolonged exposure to 
such stationary figures, indicating 
that the memory effect was resulting 
in more than a tendency to report 
three-dimensional perceptions when 
presented with a stationary figure 
previously shown in rotation. 


Accuracy of Kinetic Depth Perception 


In a recent systematic study of 
kinetic depth perception, White and 
Mueser (1960) employed the type of 
display introduced by Metzger 
(1934a). Pegs were inserted into 
holes on a turntable which was 
located between a distant light source 
and an aperture covered by a trans- 
lucent screen. When two pegs placed 
equidistant from the center of rota- 
tion were displayed as the turntable 
was rotated at a speed of 40 rpm, the 
observers reported perceiving mo- 
tion in three dimensions during at 
least half of the exposure time. Dura- 
tion of three-dimensional perception 
was increased with the use of fixation 
points to the left or right rather than 
at the center of the display, the use 
of pegs discriminably different in 
shape rather than identical in shape, 
and the use of horizontal as com- 
pared to vertical motion. 
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Most observers scored no better 
than would be expected by chance 
when asked to reproduce patterns 
generated by placing five pegs in a 
4x5 matrix of holes on the turn- 
table. When short thin pegs were 
added to the head and foot of the 
matrix as reference markers, accuracy 
exceeded change expectations, and 
was further increased by increasing 
exposure time and by using elements 
discriminably different in shape. 

The results are particularly in- 
teresting in that a depth effect was 
found for as few as two nonvarying 
elements although imaginary lines 
between them, in the two-dimen- 
sional projection, would change only 
in length and not in direction. This 
would indicate that the conclusion of 
Wallach and O’Connel! (1953), that 
displays must contain lines which 
change in both length and direction 
to produce a kinetic depth effect, 
must be limited to the specific set of 
stimuli and conditions they em- 
ployed. 


Research with Slanted Surfaces 


Gibson and Gibson (1957) sought 
to answer four questions concerning 
the motion perceived when polar 
projections of a plane surface rotating 
about a vertical axis are viewed. 
First, would the perception of a 
changing slant of a constant shape 
always occur? Second, how accurate 
and variable are judgments of 
amount of change of slant with re- 
spect to the “extent” or “length”’ of 
the transformation sequence? Third, 
does the kind of texture of the surface 
affect the accuracy of the judgments? 
Finally, how accurate are the slant 
judgments obtained when a perspec- 
tive view of a rotated plane is shown 
without showing the transformation 
leading up to it? 
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The apparatus used was termed a 
“shadow transformer.’ It consisted 
basically of a turntable placed be- 
tween a point light source and a 
translucent window. Four patterns, 
“an amoeboid group of amoeboid 
dark shapes or spots (the irregular 
texture), a solid amoeboid contour 
form (the irregular form), a square 
group of dark squares (the regular 
texture), and a solid square (the 
regular form)”’ were rotated to angles 
of from 15 degrees to 70 degrees 
(Gibson & Gibson, 1957, p. 132). 
Unlike Green’s dot figures, to be de- 
scribed below, Gibson’s spots do, of 
course, change in shape and size dur- 
ing rotation of the plane. 

All observers observing the pat- 
terns in rotation reported seeing a 
constant shape changing in slant, 
although some reported that the dis- 
play could at times be seen as the 
compression of a two-dimensional 
pattern. Slant judgments, made by 
adjusting a circular model, were in 
good psychophysical correspondence 
with the length of the transforma- 
tion sequence. The use of form 
versus texture showed no effect’ on 
the accuracy of slant judgments. 
The regularity of the pattern had at 
most a slight effect. A control 
group, which was shown motionless 
slanted patterns, tended to see the 
rotated irregular textures as being in 
the plane of the screen and although 
they reported perceiving slant in 
the rotated regular patterns, they 
grossly underestimated the degree of 
slant. 

The importance of actually viewing 
the rotation to accurate judgment of 
slant was questioned by Sidorsky 
(1958). Using apparatus similar to 
Gibson's, Sidorsky rotated a grid 
pattern of outline squares about a 
horizontal axis from 0 degrees to 40 
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degrees. The observers were shown 
only static views of the grid, at 2- 
degree steps. A shutter remained 
closed as the plane moved between 
views. Slant judgments were about 
as accurate as those obtained by 
Gibson and Gibson (1957) for planes 
shown in motion, and accordingly far 
more accurate than those obtained by 
them using static perspective views. 
J. J. Gibson® attributes this dis- 
crepancy in results to the difference 
between the instructions used by 
Gibson and Gibson (1957) and those 
used by Sidorsky (1958). Sidorsky’s 
instructions, according to Gibson, 
gave his observers ‘“‘considerable in- 
formation.”’ 

Somewhat related data comes from 
Langdon (1951), who had the ob- 
servers match a slanted circle to each 
of 15 ellipses. All were fluorescent 
wire outline figures, and were viewed 
monocularly with head movements 
avoided. Constancy was found lack- 


ing when the observer was required to 
adjust the circle to match an ellipse, 
but was restored when regular rotary 
motion of the circle was displayed and 
the observer was asked to press a but- 
ton when the standard and compari- 


son figures matched. Degree of con- 
stancy was found to vary directly 
with rate of rotation. 

Langdon explains the effect of 
rotation in terms of the “‘creation of 
an object”’ resulting from the regular 
changes in shape of the wire circle. 
His study points up the relationship 
between kinetic depth phenomena 
and the shape constancy problem. 
The extent to which his results are 
attributable to the differences be- 
tween the psychophysical methods 
employed in the two conditions (ad- 
justment and limits), particularly to 


* Personal communication, July 1959. 
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the differing modes of response, is un- 
certain. 


Green's Computer Method 


Green (1957, 1959a, 1959b) has 
introduced methodology which allows 
great variety in the presentation of 
stimuli and permits greater control of 
the movements of the points com- 
posing the display. A number of high 
speed computers, such as the IBM 
704 and 709, can be equipped with a 
cathode ray tube (CRT) output re- 
corder. Instructions may be in- 
cluded in a computer program which 
will cause a spot to appear on the face 
of the CRT at specified coordinates. 
(The IBM 704 and 709 use a CRT 
output with a 10241024 grid, 
permitting a point to be plotted in 
any of 27° positions.) A camera at- 
tached to the CRT records each spot 
as it appears. The spot disappears in 
less time than it takes to plot the next 
point, allowing for even exposure of 
the spots. The shutter remains open 
while the points are plotted, permit- 
ting any number of spots to be re- 
corded on a single film frame. After 
the desired points are 
plotted, an instruction in the program 
may be used to advance the film 
frame (IBM, 1955). It is thus possi- 
ble to plot points in a certain pattern, 
photograph the pattern, compute the 
location of the points at small inter- 
vals as the pattern undergoes a 
mathematically specified transforma- 
tion, and plot and photograph the 
pattern at each interval. The result- 
ing photographs can be made into a 
motion picture of the transforming 
pattern. Plotting and photographing 
at sufficiently small intervals will re- 
sult in the appearance of continuity 
in the transformation. Within the 
limits imposed by spot size and grid 
size, a film may be made of a two- 


number of 
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dimensional projection of any mathe- 
matically specifiable transformation 
of any figure. 

Essentially, a three-dimensional 
pattern (or a one-dimensional or two- 
dimensional pattern in three dimen- 
sions) is conceived or randomly 
generated. The coordinates of the 
points in the pattern are listed in an 
nX3 matrix where each row rep- 
resents a point and the column en- 
tries are x, y, and zg coordinates, re- 
spectively. Successive orthogonal 
transformations of the pattern may 
be accomplished by multiplying this 
matrix by a 3X3 orthogonal trans- 
formation matrix. A two-dimensional 
projection may be made of each of the 
points represented in an mX3 matrix 
by multiplying both the x and y co- 
ordinates of each point by E-F/ 
E-Z where Z is the z coordinate of the 
point, £ is the distance of the projec- 
tion point from the origin of the dis- 
play, and F is the distance of the pro- 


jection plane from the origin of the 
display (Green, 1959b). 

Green has carried out a series of 
studies ‘‘to determine the conditions 
under which the two-dimensional pro- 
jection (e.g., shadow) of a rotating 
three-dimensional figure is perceived 


as a rigid coherent figure with 
depth” (1959a, p. 9). In these ex- 
periments the observers are shown 
motion pictures prepared as described 
above and asked to rate each stimulus 
film ‘‘on a subjective scale of co- 
herence or rigidity—according to the 
degree to which the parts of the figure 
seem to maintain the same relative 
positions as the figure moves.”’ Rat- 
ings of coherence were found to in- 
crease with number of elements in 
the figure. Rated coherence was 
greatest for patterns shown rotating 
about a vertical axis, intermediate for 
patterns ‘‘tumbling’’ about a fixed 
origin and lowest for patterns rotat- 
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ing about a skew axis, at an angle of 
45 degrees from the vertical and the 
same angle from the plane of the dis- 
play. Speed of rotation had little 
effect on the ratings except in the 
case of the slowest speed used, which 
showed a decrease in rated coherence. 

Green's results with line segments 
(1959a) were similar with respect to 
speed, numerosity, and method of 
rotation. More rated coherence was 
found for a given number of line seg- 
ments than for that number of spots. 
Unlike the projections of the points, 
which did not change in size or shape 
as the patterns rotated, the projec- 
tions of the line segments changed in 
length as the projections of their end- 
points changed in distance on the 
plane of the display. Rated co- 
herence was greater for connected 
line segments than for unconnected 
ones. The connected line segments 
produced figures similar in appear- 
ance to the wire figures used by 
Wallach and O’Connell (1953) in the 
study described above. 


DISCUSSION 


Much of the work done in the area 
of kinetic depth perception was ac- 
complished by psychologists of the 
gestalt school, and it is not surprising 
that gestalt principles have been 
applied to these phenomena. In a 
chapter on space perception, Koffka 
(1930) contends that whether a 
stimulus is seen as two-dimensional 
or three-dimensional depends upon 
which mode of organization allows 
for greater symmetry and unity. Re- 
ferring specifically to the percep- 
tion of figures in motion, he postu- 
lates a ‘‘tendency to make the total 
path (of all moving parts) as simple 
and well-shaped as possible”’ (1935, p. 
301). We are left, as is usual in the 
case of gestalt explanations, with the 
problem of determining what the 
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simplest, most symmetrical motion 
would be. One might readily deter- 
mine which of several shapes is the 
simplest and most symmetrical, but 
when dealing with perceived motion, 
how does one, a priori, decide whether, 
for example, expansion-contraction 
or complete rotation has _ better 
“‘form’’? Despite such theoretical dif- 
ficulties, gestalt psychology has done 
much to fill the void in perceptual re- 
search left by behaviorism, and the 
theoretical commentaries of Wallach 
and his associates, and of Metzger, 
are worth reviewing. 

The strongest, and perhaps the 
most controversial point made by 
Wallach is that, since any single pro- 
jection of the objects used as shadow- 
casters in his kinetic depth studies 
does not look three-dimensional, 
“the perceived three-dimensional 
form is not determined merely by 
what is presented on the retina at a 
given moment” (Wallach & O’Con- 
nell, 1953, p. 207). Instead, it is 
“necessary to ascribe to a memory 
trace the power to determine the 
organization of a visual form process” 
(Wallach, O’Connell, & Neisser, 
1953, p. 364). 

Gibson (1957) opposes the reducing 
of motion perspective to remember- 
ing: 

Does a stimulus last for a second, a milli- 
second, or a microsecond? ... Is is not 
theoretically preferable to suppose that a 
transformation is a stimulus in its own right, 
just as a nontransformation is a stimulus? 


Or, still better, that sequence, as well as pat- 
tern, is a variable of stimulation? (p. 136).* 


Metzger (1953, p. 335) also argues 
that the motion itself, and not a 


‘ The concept of a transformation serving as 
a stimulus is also found in the biological 
model of perception of Pitts and McCulloch 
(1947). Of particular relevance here is their 
formulation of the exchangeability of time 
and space in form perception. 
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succession of stationary forms, should 
be considered the stimulus. He goes 
on to make the assertion that the 
principles applicable to stationary 
patterns are also applicable to pat- 
terns in motion, if instead of consider- 
ing their three-dimensional form, we 
consider the form of their motion 
(1953, p. 345). That is, as he pre- 
viously proposed (1934b, p. 258), 
tendencies toward unity, simplicity, 
symmetry, continuity, and good form 
of the ongoing motion form the basis 
of kinetic depth phenomena. Metz- 
ger’s formulation is similar to 
Koffka’s, and is subject to the same 
difficulties. 

A perceptual theory handling the 
phenomena described in this paper, 
and suggesting numerous problems 
for experimental investigation, is that 
of J. J. Gibson. Basic to his theory is 
the posulate that “The stimulus- 
variable within the retinal image to 
which a property of visual space cor- 
responds need be only a correlate of 
that property, not a copy of it” (1950, 
p. 8). Kinetic depth perception 
would then be studied by seeking 
the stimulus correlates for depth, in 
the optic arrays associated with 
moving objects, and Gibson has 
stated an hypothesis about what 
these are: ‘‘Any regular transforma- 
tion of a bidimensional image tends 
to yield a tridimensional motion in 
perception, and the kind of motion 
perceived depends on the kind of 
transformation” (1954, p. 311). This 
principle is expressed again in the 
postulate that “An eye is a device 
which registers the flow pattern of an 
optic array as well as the static pat- 
tern of an array. Conversely, such a 
family of continuous transformations 
is a stimulus for an eye. There are 
quite specific forms of continuous 
transformations, and the visual sys- 
tem can probably discriminate among 
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them” (1958, p. 185). It follows that 
“A psychophysics of kinetic impres- 
sions would require a mathematical 
analysis and classification of the 
motions or transformations of a 
retinal image”’ (1954, p. 312). Such 
an analysis might begin with a con- 
sideration of the kinds of motions 
possible in an optic array, and Gibson 
(1954, 1957) dicusses these motions. 
There are the rigid motions of transla- 
tion and rotation, which, since each 
can occur with respect to either of 
three axes, give us six continuous 
perspective transformations. There 
are also nonperspective or elastic 
transformations, characteristic of liv- 
ing organisms, and finally disjunctive 
motions of the parts of a pattern. 
Gibson (1951, pp. 404-405) dis- 
putes “‘the classical assumption that 
two-dimensional vision is immediate, 
primitive or sensory, while three- 
dimensional vision is secondary, 


derived or perceptual’ and suggests 


that other theories may fail to be 
convincing in their explanations of 
three-dimensional perception because 
they are guided by this assumption. 
Certainly the inverse assumption 
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would lead to a different orienta¢ton 
in the kind of research discussed in 
this paper, for the classical assump- 
tion seems, along with technical 
difficulties in experimentation, re- 
sponsible for the fact that although 
systematic work has long been carried 
on with stationary two-dimensional 
forms, kinetic depth phenomena, un- 
til recently, were only ‘“‘curious illu- 
sions.”’ 

Gibson’s theory is a promising 
approach to the systematic study of 
depth perception, and his research 
has contributed considerably to our 
knowledge of this area. It is Green’s 
empirical approach, however, largely 
because of its more sophisticated 
methodology, which lends the greatest 
promise to the actual development of 
a psychophysics of depth perception. 
Although the contribution in the area 
of mechanics of creating stimuli is 
important, in that it has made such 
research more practical, it is the crea- 
tion of stimuli directly from mathe- 
matical formulae which should prove 
of especially great significance in the 
study of complex perceptual phe- 
nomena. 


RENCES 


GiBson, J. J. Visually controlled locomotion 
and visual orientation in animals. Brit. J. 
Psychol., 1958, 49, 182-194. 

GiBson, J. J., & GrBson, ELEANOR J. Con- 
tinuous perspective transformations and 
the perception of rigid motion. J. exp. 
Psychol., 1957, 54, 129-138. 

GREEN, B. F. The use of high-speed digital 
computers in studies of form perception. 
In J. W. Wulfeck and J. H. Taylor (Eds.), 
Form discrimination as related to military 
problems. Washington, D. C.: NRC, 1957. 

GREEN, B. F. Kinetic depth effect. In, 
Psychology Group 58: Quarterly progress re- 
port. Cambridge: Massachusetts Institute 
of Technology, Lincoln Laboratory, March 
1959. (a) 

GREEN, B. F. Mathematical notes on 3-D 
rotations, 2-D perspective transformations, 
and dot configurations. (Group Rep. No. 
58-5) Cambridge: Massachusetts Institute 





PERCEPTION OF DEPTH 


of Technology, Lincoln Laboratory, July 
1959. (b) 

INTERNATIONAL BusINESS MACHINES CoRPo- 
RATION. 704 electronic data-processing 


machine: Manual of operation. New York: 
IBM, 1955. 
Jouansson, G. Configurations in event per- 
Uppsala: Almquist & Wicksells, 


ception. 
1950. 

Jounson, G. L. Two curious optical illusions. 
Arch. Ophthal., Chicago, 1927, 56, 465- 
468. 

Kenyon, F.C. A curious optical illusion con- 
nected with an electric fan. Science, 1898, 
8, 371-372. 

KorrKka, K. Some problems of space percep- 
tion. In C. Murchison (Ed.), Psychologies 
of 1930. Worcester: Clark Univer. Press, 
1930. 

KorrKa, K. Principles of gestalt psychology. 
New York: Harcourt-Brace, 1935. 

LANGDON, J. The perception of a changing 
shape. Quart. J. exp. Psychol., 1951, 3, 
157-165. 

Metzcer, W. Beobachtungen tiber phino- 
menale Identitét. Psychol. Forsch., 1934, 
19, 1-60. (a) 

MetTzcGER. W. Tiefenerscheinungen in opti- 
schen Bewegungsfeldern. Psychol. Forsch., 
1934, 20, 195-260. (b) 

MetTzcErR, W. Gesetze der Sehens. Frankfurt: 
Waldemar Kramer, 1953. 

Mires, W. R. Figure for the “windmill illu- 
sion.”” J. gen. Psychol., 1929, 2, 143-145. 

Mixes, W. R. Movement interpretations of 
the silhouette of a revolving fan. Amer. J. 
Psychol., 1931, 43, 392-405. 


433 


Musatt!, C. L. Sui fenomeni stereocinetici. 
Arch. Ital. Psicol., 1924, 3, 105-120. 

Musattl, C. L. Forma e assimilazione. Arch. 
Ital. Psicol., 1931, 9, 61-156. 

Puiip, B. R., & FisicHELu, V. R. Effect of 
speed of rotation and complexity of pattern 
on the reversals of apparent movement in 
Lissajous figures. Amer J. Psychol., 1945, 
58, 530-539. 

Pitts, W., & McCuttocn, W. S. How we 
know universals: The perception of auditory 
and visual forms. Bull. math. Biophys., 
1947, 9, 127-147. 

Siporsky, R. C. Absolute judgments of static 
perceptive transformations. J. exp. Psy- 
chol., 1958, 56, 380-384. 

Wa tac, J.,& O’'Conne i, D. N. The kinet- 
ic depth effect. J. exp. Psychol., 1953, 45, 
205-217. 

Watvacn, H., O’ConneELL, D. N., & NEIs- 
sER, U. The memory effect of visual precep- 
tion of three-dimensional form. J. exp. 
Psychol., 1953, 45, 360-368. 

Watiacn, H., Wersz, A., & Apams, P. 
Circles and derived figures in rotation. 
Amer. J. Psychol., 1956, 69, 48-59. 

WeseErR, C. O. Apparent movement in 
Lissajous figures. Amer. J. Psychol., 
1930, 42, 647-649. 

Waite, B. J., & Mueser, G. E. Accuracy in 
reconstructing the arrangement of elements 
generating kinetic depth displays. J. exp. 
Psychol., 1960, 60, 1-11. 


(Received May 8, 1961) 





Psychological Bulletin 
1962, Vol. 59, No. 5, 434-448 


COLOR VISION RESEARCH AND THE 
TRICHROMATIC THEORY: 
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Scientific interest in color began in 
the latter part of the seventeenth 
century with the research of Newton 
(1672, 1757) on light and colors. Of 
historical significance was his dis- 
covery that white (or grey) and all 
other colors are, in fact, reproducible 
with a mixture of two or more kinds 
of homogeneous light (Newton, 
1704). Though his concepts regard- 
ing the perception of colors were tra- 
ditional in the sense that physical 
properties of objects were transmit- 
ted to the sensorium, it can be said 
that he influenced subsequent theo- 
ries by assigning a definite role to the 
optic nerve fibres. In a letter to the 


Royal Society in 1675 (cf. Newton, 
1757) he suggested that light rays 


excited vibrations in the retinal 
terminations of the optic nerve, that 
the vibrations were transmitted by 
the optic nerve fibres to the sen- 
sorium, and that here the different 
colors were experienced according to 
the strength and mixture of their 
vibrations. Ina fancied analogy with 
the notes of a musical scale, Newton 
supposed there were seven kinds of 
light, each with its characteristic 
vibration rate. 

During the eighteenth century 
some physicists expressed the opinion 
that a minimum of three elementary 


1 Supported by a contract between the 
Office of Naval Research and Columbia Uni- 
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or primary colors were necessary to 
reproduce all the known colors 
(Helmholtz, 1867, 1924-25).  Ex- 
perimenting with mixtures of colored 
pigments they concluded that red, 
yellow, and blue were the primary 
colors, green being excluded since it 
was obtainable with mixtures of yel- 
low and blue substances. Wunsch 
(Géthlin, 1943), experimenting with 
mixtures of spectral lights, concluded 
that red, green, and violet were the 
primary colors. In 1758, the mathe- 
matician Tobias Mayer put forward 
the view that all spectral lights were 
mixtures of three kinds of light, 
namely, red, yellow, and blue. In 
1777, Giros de Gentilly (1785) argued 
not only for a trichromasy of physical 
light but also for a trichromatic phys- 
iological mechanism in the retina. 
He speculated that color perception 
was mediated by three types of mem- 
branes or molecules, each selectively 
sensitive to one of the three kinds of 
light. He made the original sugges- 
tion that defective color vision was 
due to the inactivity of one of the 
membranes or groups of molecules. 

By the end of the eighteenth cen- 
tury it had become quite clear that 
although perceived colors may be 
limited, white light actually consists 
of an infinite number of rays, differ- 
ing from each other in color and 
refrangibility. In his discussion of the 
physical theory of light, Thomas 
Young (1802b) found it necessary to 
modify Newton’s theory concerning 
the perception of colors. He pointed 
out that since: 
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it is aimost impossible to conceive of each 
sensitive point of the retina to contain an in- 
finite number of particles, each capable of 
vibrating in perfect unison with every possible 
undulation, it becomes necessary to suppose 
the number limited; for instance, to the three 
principal colors red, yellow and blue. 


It was possible, he said, that each 
sensitive filament of the nerve con- 
sisted of three portions, one for each 
principal color. A year later, Wol- 
laston’s (1802) description of the 
spectrum led Young (1802a, 1807a, 
1807b) to modify his previous re- 
marks regarding “‘the proportions of 
the sympathetic fibres of the retina.” 
He now substituted red, green, and 
violet for red, yellow, and blue, 

In 1794, the distinguished chemist 
John Dalton (1798), presented to the 
Manchester Library and Philosophi- 
cal Society, a dramatic account of his 
own peculiar vision for colors. While 
it is certain that such persons with 
defective color vision were not new to 
the human race, recorded instances of 
them can be traced only to the seven- 
teenth and _ eighteenth centuries 
(Huddart, 1777; Turberville, 1684; 
Whisson, 1778). Dalton’s testimony 
stimulated scientific interest in the 
phenomenon of color blindness. He 
said: “In the solar spectrum three 
colors appear; yellow, blue and pur- 
ple. The two former make a con- 
trast; the two latter seem to differ 
more in degree than in kind.” (What 
basis Dalton used for naming his 
colors is a mystery! The names are 
almost but not quite what one would 
expect from a unilaterally deuteran- 
opic subject who can give ‘‘normal’”’ 
names to colors seen in the color- 
blind eye [Graham & Hsia, 1958].) 
Dalton supposed his anomaly was 
due to a color medium in his eye 
which absorbed only the red and 
green rays. At his request, a post- 
mortem examination of his eyes was 
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made (Henry, 1854) and his theory 
disproved. 

Thomas Young (1807a, 1807b) 
ascribed Dalton’s peculiar color vi- 
sion to ‘‘the absence or paralysis of 
those fibres of the retina which are 
calculated to perceive red.”’ A con- 
troversy immediately arose as to 
whether Dalton and others like him 
were totally insensitive to the red end 
of the spectrum (Seebeck, 1837; Wil- 
son, 1855). Dalton’s statements had 
failed to make this point clear. 

Of the early attempts to classify 
color-blind persons (Purkinje, 1828; 
Seebeck, 1837; Szokalski, 1841; 
Wartmann, 1846; Wilson, 1855), that 
of Seebeck’s received the most atten- 
tion. He used the first formal screen- 
ing tests for the classification of color 
vision types. Though red-green con- 
fusion was common to all his observ- 
ers, they fell into two groups regard- 
ing the visible limits of the solar 
spectrum. The first group reported 
seeing the normal limits and called 
the spectral colors ‘‘blue’’ and “‘red.”’ 
The second group was relatively in- 
sensitive to the red end of the spec- 
trum and called its colors ‘‘blue’’ and 
“‘vellow.’’ We know today that color 
names given by color-blind persons 
are unreliable, depending, as they 
do, on learned cues of brightness, 
saturation, texture, or position. The 
only justifiable method of identifying 
the colors these persons see in the 
spectrum is to obtain certain specific 
types of experimental data from uni- 
laterally color-blind persons with 
normal color vision in the “‘good”’ eye 
(Graham & Hsia, 1958). 

In a letter to Dalton in 1833, Her- 
schel made some significant contribu- 
tions to color theory (Henry, 1854). 
He introduced the following defini- 
tion of normal and color-blind vision: 
In normal color vision all colors can 
be referred to a mixture of three pri- 
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maries, while in color-blind vision all 
colors are referable to a mixture of 
two primaries. Herschel wrote, 
“Now, to eyes of your kind, it seems 
to me that all your tints are referable 
to two, which I shall call A and B; the 
equilibrium of A and B producing 
your white....’’ Another color- 
blind scientist, William Pole (1857), 
interpreted this statement to mean 
that persons like him should see grey 
(or white) in the place of green. 
Later, James Clark Maxwell noted 
(1855a) that his color-blind subject 
saw a white in the blue-green band of 
the spectrum. This band, which came 
to be known as “‘the neutral point,” 
has played an important role in color 
theory and the diagnosis of color 
blindness. 

In Herschel’s letter there also ap- 
peared for the first time the terms 
“dichromic vision’ and ‘‘dichroma- 
tism’’ (Henry, 1854). To him these 
terms described two-color vision, 


that is, vision for blue and yellow. 
Though he introduced both terms at 
the same time, dichromic was gener- 
ally adopted and used for several 


years. Wartmann (1846) confused 
the issue with his ‘“dichromatic Dal- 
tonism’’ which referred to vision 
where all colors are seen as shades of 
grey. The term dichromatic finally 
superseded dichromic when Donders 
(1884) and Kénig (1903) applied it to 
the more frequent types of color- 
blind persons who require a mixture 
of two monochromatic lights to 
match the spectral colors. 

By the middle of the nineteenth 
century many speculations concern- 
ing color perception had been ad- 
vanced (Wartman, 1846; Wilson, 
1855). In 1852, Helmholtz revived 
Young's theory in connection with 
his well-known experiments concern- 
ing spectral complementaries (Helm- 
holtz, 1852), only to reject or ignore 
it (Helmholtz, 1853, 1855). An im- 
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portant outcome of these experiments 
was his explanation as to why the 
laws of color mixture sometimes 
break down for mixed pigments. For 
example, pigment mixtures of blue 
and yellow give you green, but a mix- 
ture of blue and yellow spectral lights 
gives you grey or white. He pointed 
out that the final wave lengths re- 
flected by colored substances are 
those wave lengths of incident light 
that remain after successive selective 
absorptions by the colored media 
(Helmholtz, 1852). Failure to under- 
stand this principle vitiated many of 
the interpretations of workers before 
Helmholtz. 


CoLtor MIXTURE 


Maxwell’s early color mixture ex- 
periments (which date from 1852) 
were conducted with rotating disks of 
pigment colors (Maxwell, 1855a, 
1855b) and were followed by experi- 
ments with spectral colors (Maxwell, 
1860). Maxwell was responsible for 
the resuscitation of Young’s theory, 
for from the years 1855 to 1860, he 
repeatedly worked out the implica- 
tions of color mixture and color 
blindness for this theory. In 1855, he 
described theoretical response curves 
for red, green, and violet nerve sys- 
tems (Maxwell, 1855a) and in 1860, 
computed three such curves from ex- 
perimental data for normal observ- 
ers. As predicted by Herschel, his 
color-blind observer required only 
two primaries to match the colors of 
the spectrum. Consequently, color- 
blind vision was represented graphi- 
cally with two response curves (Max- 
well, 1860). It was only in 1860 that 
Helmholtz came out firmly in sup- 
port of Young’s theory (Helmholtz, 
1867) and continued to be its cham- 
pion until his death in 1894. Max- 
well’s publications on color vision 
ceased with the year 1871. 

Maxwell (1860) was influenced by 
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Brewster in the graphical representa- 
tion of his curves of ‘‘primary intensi- 
ties’ and their relationship to the 
spectral brightness or luminosity 
curve. Brewster (1834), like Mayer, 
supported a triple-spectrum theory 
and his views received much atten- 
tion (Helmholtz, 1924-25). Helm- 
holtz’s (1867) diagrams of the theo- 
retical response curves which first ap- 
peared in 1860 bear a striking resem- 
blance to Brewster’s diagrams of 
separate and superposed intensity 
curves of the triple spectrum. 
Maxwell popularized the use of an 
equilateral triangle to describe and 
predict the data of color mixture. 
Such geometrical representations orig- 
inated with Newton (1704) who used 
a circle to illustrate the results of 
color mixture. Later, Tobias Mayor 
used an equilateral triangle for the 
same purpose (Forbes, 1849). The 
chromaticity diagram of today is 


a right angle triangle that represents 


approximately the hue and saturation 
of all colors in terms of the three pri- 
maries red, green, and blue. 
Maxwell’s pioneer color mixture 
experiments are not precise by mod- 
ern standards. They were repeated 
by Kénig and Dieterici (1886) and 
Abney (1913) who made a more sys- 
tematic study of the problem with 
normal and color-blind observers. 
Their data were made the basis for 
the three response or excitation 
curves that were recommended as 
standard in 1922 by the Optical So- 
ciety of America (Troland, 1922). 
The observations of Kénig and 
Dieterici, and Abney, however, left 
something to be desired both in the 
apparatus used and in the number of 
observers studied (Wright, 1946). 
About 25 years ago, careful rede- 
terminations were made independ- 
ently by Guild (1931) and Wright 
(1929) and were found to be in agree- 
ment. Their results provided the basis 
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for new standard trichromatic mix- 
ture curves which were adopted in 
1931 (Commission Internationale de 
l’Eclairage, 1932). For dichromatic 
color mixture, Pitt’s (1935) data are 
considered the most precise. 


LUMINOSITY 

Newton (1704) had observed: 
The most luminous of the Prismatic Colours 
are the yellow and orange. These affect the 
senses more strongly than all the rest together, 
and next to these in strength are the red and 
green. The blue compared with these is a faint 
and dark Colour, and the indigo and violet are 
much darker and fainter, so that these com- 
pared with the stronger Colours are little to be 
regarded. 


About 100 years later, Fraunhofer 
(1824) attempted to measure the 
relative brightness of spectral lights. 
Until 1883 however, all luminosity 
measures were relative, since methods 
to measure the absolute energy of 
spectral wave lengths did not exist. 
In 1888, Langley published luminos- 
ity curves for three observers, and he 
plotted, against wave length, the 
absolute energies that would give 
equal acuity, and presumably equal 
brightness (Langley, 1888). He 
started the modern practice of taking 
the reciprocals of these energies as 
measures of the sensitivity of the eye, 
with the result that a ‘‘bell-shaped”’ 
curve was obtained. 

Modern methods followed to ob- 
tain luminosity data either involve 
heterochromatic brightness matches 
(Gibson & Tyndall, 1923; Ives, 1912) 
or absolute threshold measurements 
(Graham & Hsia, 1954; Hecht & 
Hsia, 1947). Luminosity curves are 
not directly comparable since energy 
distributions differ materially in the 
spectra of the various sources used. 
To make such curves comparable, it 
is common practice arbitrarily to cor- 
rect them for an equal energy distri- 
bution. 
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Gibson and Tyndall (1923) made a 
graphical compilation of all existing 
luminosity data (for daylight vision) 
where more than 10 observers pro- 
vided the measurements. They de- 
rived a luminosity curve for a total of 
200 observers as representative of 
both the equality-of-brightness and 
the flicker methods. This curve was 
adopted as a standard in 1924 and its 
validity reaffirmed in 1939 (Gibson, 
1940) by the International Commis- 
sion of Illumination. 

Studies of the dichromatic luminos- 
ity function have been reported since 
1879 (Hecht & Shlaer, 1936). It ap- 
peared that Seebeck’s (1837) two 
kinds of color blindness were ac- 
counted for by these researches. 


Some of the dichromats apparently 
had a normal luminosity curve with a 
maximum around A555 my while 
others had a marked loss in sensi- 
tivity at the long wave length end 
and a maximum around A540 mu. 
Von Kries (1897) introduced what he 


considered were nontheoretical Greek 
names for these two classes of dichro- 
mats, viz., deuteranopes and pro- 
tanopes. Actually, protanopia liter- 
ally means blindness to the first pri- 
mary and deuteranopia, blindness to 
the second, and hence these names 
are not free from a theoretical conno- 
tation. The modern tendency is to 
view the terms protanopia and deu- 
teranopia as synonymous with ‘“‘first 
type” and “second type,’’ respec- 
tively. 

The more recent experiments of 
Pitt (1935) seemed to verify the re- 
sults of the earlier studies of the di- 
chromatic luminosity function. How- 
ever, Hecht and Hsia (1947) and 
Graham and Hsia (1954) have 
pointed out that the arbitrary prac- 
tice of plotting luminosity curves 
with their maxima set at 100% 
masks any differences of shape or 
height among the curves. They de- 
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termined threshold energies through- 
out the spectrum for normal and di- 
chromatic observers and did not 
adopt the misleading practice of set- 
ting the maxima at 100%. When the 
reciprocals of absolute or relative 
energies are presented this way, 
protanopes show a loss of sensitivity 
for the red and yellow wave lengths 
and deuteranopes a loss for the green 
and blue wave lengths, as compared 
to the normal trichromat. An in- 
creasing number of investigators re- 
port that deuteranopes show a defi- 
nite loss of sensitivity at the short 
wave length end of the spectrum and 
that their luminosity function is un- 
like that of normal observers (Gra- 
ham & Hsia, 1960). 


WAVE LENGTH DISCRIMINATION 


Color vision researchers have also 
used wave length discrimination data 
as a basis for theory. In 1867 Man- 
delstamm reported such data for the 
normal eye (1867). Several investi- 
gations of this problem have since ap- 
peared in the literature (Ladekarl, 
1934; Laurens & Hamilton, 1923; 
Wright & Pitt, 1934). The experi- 
mental method most frequently used 
involves a comparison of two con- 
tiguous monochromatic lights of the 
same wave length, first equated for 
color (hue and saturation) as well as 
brightness. The wave length of one 
of the lights is then varied until the 
observer reports a just-perceptible 
color difference. The brightness of 
the wave length being compared is al- 
ways equated before the decision 
about color is made. This procedure 
is adopted to control the Bezold- 
Briicke phenomenon (Purdy, 1930, 
1937), viz., the fact that spectral 
lights vary in color with changes in 
luminance levels. The difference 
threshold is generally expressed in 
terms of the just perceptible change 
in wave length (AA). Some studies ex- 





THE TRICHROMATIC THEORY 


press the mean error of the difference 
thresholds as a function of the stand- 
ard spectral wave lengths (Kénig & 
Dieterici, 1886; Ladekarl, 1934; Lie- 
berman & Marx, 1911). 

When the method of just-percepti- 
ble difference is adopted, the results 
take the form of a curve which ap- 
proaches the base line at parts of the 
spectrum where discrimination is 
good and recedes from it where dis- 
crimination is comparatively poor. 
In other words, the minima in the 
curve correspond to relatively higher 
sensitivities than the maxima. For 
normal observers, wave length dis- 
crimination is largely determined by 
hue differences. Steindler (1906) 
found four regions of minimal thresh- 
olds in the violet, blue-green, yellow, 
and red, the curve taking the shape 
of four successive troughs. She veri- 
fied the existence of the minimum at 
the red end of the spectrum by 
matching the wave lengths for bright- 


ness with a nichol prism and using 
red filters to guard against stray light 


(p. 50). Jones (1917) and Laurens 
and Hamilton (1923) obtained ap- 
proximately the same _ empirical 
curve. The results of Wright and Pitt 
(1934) were in fair agreement with 
the three researches just mentioned 
except that the minimum in the red 
(around 630 my) is missing. Other 
modern researches report maximum 
sensitivity for wave length discrimi- 
nation at only the blue-green and 
yellow portions of the spectrum (Cor- 
bett, 1937? Ladekarl, 1934; Malling, 
1919; Roaf, 1927). In view of these 
varying results it is not surprising 
that a standard wave length dis- 
crimination curve is lacking. 
Donders (1884) and Kénig and 
Dieterici (1886) were among the 
earliest to report the wave length dis- 
crimination of dichromats. They 
were followed by several others 
(Hecht & Shlaer, 1936; Ladekarl, 
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1934). The empirical results have 
been remarkably similar. Both pro- 
tanopes and deuteranopes show the 
same reduced capacity for wave 
length discrimination, their thresh- 
old discriminations taking the form 
of a U shaped curve, with a single 
minimum in the blue-green wave 
lengths around 500 my. On either 
side of this minimum, discrimination 
begins to deteriorate, being extremely 
poor in the violet and green portions 
of the spectrum. No color differences 
can be detected in the long wave 
lengths and after about 530 muy.* 


SATURATION DISCRIMINATION 


In everyday speech the saturation 
of a color is referred to with adjectives 
such as ‘“‘pale,”’ ‘‘weak’”’ or “‘light,” 
“strong,” ‘‘dark’’ or “‘deep.’’ Even 
after a cursory examination of the 
spectrum, persons with normal color 
vision report that the spectral colors 
appear unequally saturated, the col- 
ors at the extremes being more 
saturated than the colors in the mid- 
dle. Early experimentation with sat- 
uration depended entirely on methods 
involving rotating discs of pigment 
colors (Parsons, 1924). To this day, 
we lack precise experimental data on 
the relative saturation of spectral 
wave lengths, first, because the prob- 
lem of quantifying saturation judg- 
ments has not been satisfactorily re- 
solved and second, because a suitable 
experimental method of obtaining 
these measurements has not yet been 
demonstrated. 

The attention of experimenters has 


*Balaraman, Hsia, and Graham (1962) 
found unreliable (though recordable) wave 
length discrimination thresholds for two 
deuteranopes and three protanopes in the 
long wave lengths. Further improved research 
with similar observers might indicate whether 
some dichromats see more saturation differ- 
ences at the long wave length end than 
others, or whether these observers are partial 
dichromats. 
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hitherto been focused largely on 
stimulus variables that control spec- 
tral saturation. One such variable is 
luminance level (Purdy, 1931). As 
you increase the luminance of a dim 
monochromatic light, the saturation 
increases until a level is reached where 
saturation is ata maximum. Further 
increases result in the diminution of 
saturation until at certain high lumi- 
nance levels, spectral wave lengths 
lose all color and appear white to the 
observer. 

Another saturation variable is dem- 
onstrated as follows: Starting with 
two equally bright white lights A and 
B, the experimenter adds varying 
amounts of a given monochromatic 
light to A. When the observer re- 
ports that A is _ just-perceptibly 
colored, the experimenter begins to 
add varying amounts of the same 
monochromatic light to B. The ob- 
server now indicates when B is just- 
perceptibly more saturated than A. 
By this step-by-step method of com- 
parison, so-called ‘‘saturation steps”’ 
are measured. When the number of 
saturation steps is plotted against 
wave length, a V shaped curve is ob- 
tained with a minimum at about 570 
my (Martin, Warburton, & Morgan, 
1933). Such saturation steps refer to 
a stimulus consisting of spectral wave 
lengths and nonspectral white, thus 
confusing the issue of spectral satura- 
tion. 

In the above experiment, amounts 
of the monochromatic light are added 
in such a way that the total lumi- 
nance of the mixture field is main- 
tained constant. The procedure re- 
quires that when the luminance JL) of 
the monochromatic light is increased, 
the luminance of the white light L, 
must be decreased by the same 
change in luminance. It has become 
common practice to express the first 
saturation step from white in terms of 
the ratio of luminances LZ, to L,+JZ). 


SHAKUNTALA BALARAMAN 


This ratio is called the least or mini- 
mum colorimetric purity, or p. The 
reciprocal of p is generally used for 
the reason that it conveniently repre- 
sents a poor discrimination by a low 
value and a better discrimination by 
a high value. When the reciprocal of 
p is plotted against wave length, a V 
shaped curve is obtained, with a 
minimum in the 575 my region (Hart- 
ridge, 1950; Jameson & Hurvich, 
1955). Since this V shaped function 
roughly correlates with the subject’s 
verbal report about the relative satu- 
ration of spectral wave lengths, it has 
become customary to speak of colori- 
metric purity as a derived measure of 
saturation discrimination. Actually, 
p cannot be a measure of the relative 
saturation of spectral wave lenghts 
since by definition has a value of 1.0 
for any spectral light that is added to 
white light (Graham, 1959). 

For the dichromats, the reciprocal 
of p increases rapidly from the neu- 
tral point and towards the short wave 
lengths. The function rises rapidly at 
first from the neutral point and 
towards the longer wave lengths up 
to 530 mu where it gradually levels 
off. Minimum p values occur near 
500 my for deuteranopes and 490 mu 
for protanopes (Chapanis, 1944; 
Hecht & Shlaer, 1936). The color- 
imetric purity function for both types 
of dichromat is indeterminate at the 
neutral point. This is an expected 
phenomenon since dichromats see a 
narrow band of wave lengths near 
500 my as white or neutral in color. 
An infinite amount of such wave 
lengths can be added to a white light 
without producing any color change 
for these observers. 


THE ANOMALOUS TRICHROMATS 


In 1881, while performing color 
mixture experiments with subjects 
thought to be normal, the third Lord 
Rayleigh (1882) found wide varia- 
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tions in the ratio of red to green lights 
mixed to match a_ yellow light. 
Though all 23 of his subjects required 
one particular mixture of red and 
green to match the yellow, five of 
them required more green and two of 
them more red than the others. Their 
match was not acceptable to the 
normal eye; it appeared too greenish 
or reddish. Rayleigh’s dichromats 
could match a full red or green with 
the yellow by merely adjusting the 
luminance of the yellow. Later 
(Rayleigh, 1890) he found an observ- 
er who could match the green and 
not the red with the yellow. It was of 
this observer that he remarked. “‘It 
looked as though the third color sen- 
sation presumably red, was defective, 
but not absolutely missing.”’ His- 
torically, Rayleigh’s first two diver- 
gent groups came to be known as 
anomalous trichromats and in line 
with trichromatic theory it was be- 
lieved that such persons were either 
“‘oreen-weak’’ (deuteranomalous) or 
“‘red-weak’’ (protanomalous). 

We know today that the frequency 
distribution of the red/green ratios 
required by a large sample of normal 
observers takes the form of a normal 
probability curve with certain cases 
falling outside of the limits of this 
curve. There is no general agreement 
as to what red/green ratios should be 
called anomalous. Some workers 
apply the term to extreme variants 
within the normal curve (Edridge- 
Green, 1913; Nelson, 1938) and 
others reserve it for cases that lie out- 
side of the normal curve limits (Hail- 
wood & Roaf, 1937; Nelson, 1938; 
Schmidt, 1955; Schuster, 1890). 
Anomalous observers also differ in 
their range of matches for the yellow 
(Jameson & Hurvich, 1956). 

Anomalous observers generally re- 
quire three lights or primaries to 
duplicate spectral colors but in pro- 
portions which differ from those of 
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the normal trichromats (McKeon & 
Wright, 1940; Nelson, 1938). Others 
who find two primaries sufficient to 
match certain ranges of the spectrum, 
are frequently called ‘‘the extreme 
anomalous” or ‘‘partial dichromats.”’ 
Such persons have a neutral point in 
the blue-green wave lengths (Abney, 
1895; Hayes, 1911; Koffka, 1909; 
Rosenkrantz, 1926). 

McKeon and Wright (1940) re- 
ported a marked loss in luminosity at 
the red wave lengths for their pro- 
tanomalous observers, similar to the 
luminosity losses of the protanopes. 
It is not clear whether such a marked 
loss for the protanomalous is an em- 
pirical fact or an artifact arising from 
the convention of naming maximum 
sensitivity 100%. Wright (1946) is 
inclined to believe that these cases 
represent extreme protanomalous ob- 
servers and that his sample was not 
representative of various degrees of 
the defect. 

The results of Pitt (1935, Appen- 
dix 1) and Nelson (1938) are not in 
agreement concerning the deutera- 
nomalous luminosity function. Pitt 
found luminosity losses in the blue 
and the green wave lengths but 
luminosity increases at the yellow and 
orange wave lengths, as compared to 
the normal luminosity curve. Nelson 
believes his results indicate almost a 
normal luminosity function for the 
deuteranomalous. 

The wave length discrimination of 
anomalous observers is described by 
a two-minima curve (Engelking, 
1926; McKeon & Wright, 1940; Nel- 
son, 1938). For both protanomalous 
and  deuteranomalous 
maximum sensitivity 


observers, 
is recorded in 


the blue-green wave lengths around 
500 mu. The second minimum in the 
longer wave lenghts is higher than 
the first, indicating poorer discrimi- 
nation. It appears in the yellow (580 
my) for the protanomalous and in the 
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orange (620 my) for the deuteranom- 
alous. 

The colorimetric purity data for 
the anomalous also differs from that 
of the normal (Chapanis, 1944; Mc- 
Keon & Wright, 1940; Nelson, 1938). 
In general, the values of 1/p tend to 
be smaller throughout the spectrum, 
again indicating poorer discrimina- 
tion. Nelson (1938) and McKeon and 
Wright (1940) found much smaller 
variations in the curve and no marked 
minimum as compared to the normal 
observer. According to Chapanis 
(1944) protanomalous observers re- 
semble protanopes in this function 
except for a secondary dip in the 560 
my region. 


THE CLASSIFICATION OF 
CoLor Vision TYPEs 

Modern classifications of color 
vision types rely heavily upon the 
data of color mixture since there is 
either lack of agreement or insufh- 
cient research concerning other basic 
visual functions. It has become ac- 
cepted practice to define color vision 
types according to the number of pri- 
maries necessary to reproduce spec- 
tral colors. The normal trichromat 
requires three primaries to do this 
while the dichromat requires two pri- 
maries. Dichromats are today sub- 
divided into three classes: prota- 
nopes, deuteranopes, and tritanopes. 
Modern research with tritanopes has 
not advanced as far as it has for the 
other two classes of dichromats 
(Wright, 1952). Anomalous trichro- 
mats generally require three pri- 
maries to duplicate spectral colors 
but in proportions which differ from 
those of the normal trichromats (Mc- 
Keon & Wright, 1940; Nelson, 1938). 
They are subdivided into three 
classes: protanomalous, deuteranom- 
alous, and _ tritanomalous. Some 


rare individuals can reproduce all 
Rate 
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spectral colors with a single spectral 
light or primary. They are called 
monochromats and on the basis of 
their luminosity data are subdivided 
into two or more classes (Judd, 1943; 
Pitt, 1944). 

It has been estimated that less than 
.5% of females have defective color 
vision. The incidence among males 


is approximately 8%, of which about 
6% are anomalous trichromats and 
2% dichromats (Wright, 1946). 


THE TRICHROMATIC THEORY 
Normal Color Vision 


As we have already noted, the es- 
sential aspect of this theory is the 
concept that there are three sets of 
cone mechanisms in the fovea, each 
with a given spectral sensitivity. The 
red receptors are said to be maximally 
sensitive to the red wave lengths, the 
green receptors to the green wave 
lengths, and the blue receptors to the 
blue wave lengths. The complete ab- 
sence of histological evidence for 
three types of cones is not held 
against the theory. It is argued that 
the triple subdivision may occur in 
the cones on a_ submicroscopical 
scale. It is hoped that work such as 
Granit’s (1947) electrophysiological 
research and Rushton’s (1957, 1958) 
research on the foveal pigments of 
normal and color-blind eyes may es- 
tablish such a hypothesis on an ob- 
jective basis. 

The fundamental response curves‘ 
are said to be characteristic of the 
three hypothetical, theoretical recep- 
tor systems. The trichromatic theory 
postulates that spectral hue is de- 
pendent on the ratio, and spectral 


* Fundamental response curves are derived 
from spectral mixture curves. The latter rep- 
resent brightnesses of the primaries required 
to match the brightness of a given spectral 
wave length (from a light source of known 
wave length distribution). 
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brightness (or luminosity) on the 
sum, of the ordinates at any given 
wave length. ‘‘White’’ is experienced 
when all three receptor mechanisms 
are in appropriate ratios (usually 
equal) stimulated, and this latter 
condition is assumed when all three 
ordinates are equal. The saturation 
of perceived color is influenced by the 
ratio of the ‘‘white-producing”’ ordi- 
nates to the remaining ordinate or 
ordinates. When the red and green 
receptors are equally stimulated— 
and this is assumed when the red and 
green ordinates are equal—a “‘yel- 
low” experience is initiated in the 
brain. 

The three response curves are con- 
sidered to be independent of lumi- 
nance. At first it was believed that 
hue and saturation were invariant 
with changes in luminance. When it 
was established that this expectation 
was contrary to the facts, a new hy- 
pothesis was brought forward by 
Helmholtz, viz., that the receptor 
responses do not increase in propor- 
tion to their stimuli but obey a law of 
diminishing returns (Purdy, 1931). 
For a given increase in spectral lu- 
minance, the weakest of the three re- 
ceptor systems will increase relatively 
more than the two stronger systems, 
thus increasing the amount of white 
being experienced. Spectral wave 
lengths therefore become more de- 
saturated with increases in lumi- 
nance. 


Color-Blind Vision 


It was first assumed from general 
considerations of heredity and evolu- 


tion that normal and _ color-blind 
vision are related. The empirical 
study of the latter was therefore of 
great significance for theories of 
normal color vision. It was believed 
that dichromasy was caused by the 
total loss of a fundamental system 
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(Helmholtz, 1867; Kénig & Dieterici, 
1886, Maxwell, 1855b). Such a hy- 
pothesis seemed to explain the color 
mixture, luminosity, and wave length 
discrimination data of dichromats. 
White was experienced when the two 
mechanisms were equally stimulated, 
a hypothesis that could also be used 
to explain the neutral point. When 
monocular color-blind persons (with 
deuteranopic or protanopic vision in 
one eye) claimed they saw a blue and 
a yellow in a spectrum with the de- 
fective eye (Judd, 1948), the sup- 
porters of a reduction system were at 
a loss. The system they hypothesized 
could not explain the perception of 
yellow in dichromats. Some of them 
turned to a “fusion theory”’ to ac- 
count for the reported color experi- 
ences of the dichromats. 

The original statement of the 
fusion theory is attributed to Helm- 
holtz by Pole (1893). If Helmholtz 
intended to describe such a concept 
in 1867, he did not make himself very 
clear (Helmholtz, 1867, p. 848). The 
first straightforward description of it 
was given by John Aitken in 1872. 
Discussing possible changes in the 
shape and number of excitation 
curves that would account for color 
blindness, Aitken (1872) suggested 
that in some cases, “‘the nerves might 
be so constructed that the red nerves 
might be sensitive to all the rays to 
which the green nerves are sensitive,” 
so that both nerves being excited at 
the same time, ‘‘the sensation pro- 
duced would be what we call yellow.” 
Leber (1873) and Fick (1874) put 
forward the same viewpoint and sug- 
gested it as an explanation for both 
dichromatism and the extrafoveal 
color vision of the normal trichromat. 
Fick (1879, 1890) became a strong 
supporter of this theory to such an 
extent that it often bears his name. 

The fusion concept postulates that 
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the perception of yellow in color- 
blind vision originates in the central 
nervous system. While it can be 
made to account for color mixture, 
wave length discrimination, and neu- 
tral point of the dichromats, it seems 
to be irreconcilable with their lumin- 
osity functions. Helmholtz in 1885 
believed he had found an application 
of the fusion concept that would ex- 
plain the dichromatic luminosity 
function (1896, p. 369). He pointed 
out that if the green curve shifted to 
the red curve, the colors at the red 
end of the spectrum will appear com- 
paratively bright and the green wave 
lengths less bright. This condition 
would describe deuteranopia. If the 
red curve shifted over the green 
curve, there will be reduced sensi- 
tivity for red wave lengths and the 
green wave lengths around 540 my 
will appear comparatively bright. 
This would describe protanopia. 


However, it should be recognized 


that if dichromasy represents fusion 
and not loss, the luminosity at the 
protanopic maximum (540 mz) 
should be higher than normal at this 
point and higher along the shorter 
wave lengths. By the same logic, the 
maximum luminosity around 570 mu 
and luminosity for the longer wave 
lengths should be higher for the 
deuteranope as compared to normal 
luminosity. Unfortunately, the di- 
chromatic luminosity curves do not 
fulfill these expectations. 

The existing contradictions be- 
tween trichromatic theories (‘‘fusion”’ 
and “‘reduction’’) and the dichromat- 
ic luminosity function have led some 
theorists to take the stand that 
whereas deuteranopia represents a 
fusion of red and green excitations, 
protanopia can be explained only by 
a simple reduction system (Pitt, 
1945). Such a stand does not resolve 
the contradictions described above 
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Other theorists feel that since the 
fusion theory explains more facts of 
color blindness than the reduction 
theory, it can be modified to ex- 
plain dichromatic luminosity losses. 
Graham and Hsia’s (1958) double- 
shift concept is an example of such 
a formulation. Others postulate 
a separate white mechanism which is 
assumed to be either entirely or 
mainly responsible for brightness 
(Hunt, 1952; Piéron, 1952). 
DiscUSSION AND CONCLUSIONS 

Normal color vision is trichromatic. 
This statement is sometimes taken 
incorrectly to mean that a spectral 
monochromatic band of wave lengths 
can be matched by a mixture of three 
primaries. What is in fact the case is 
that the monochromatic band may 
be mixed with one of the primaries to 
give a two-color mixture that can 
match the mixture of the other two 
primaries. In the algebraic represen- 
tation of the situation the primary 
that is mixed with the test mono- 
chromatic band is given a negative 
sign (as, for example, in the equation 
cC=rR+gG —bB, which means that c 
units of test color C mixed with b 
units of primary B (blue) match r 
units of primary R (red) plus g units 
of primary G (green). The fact that 
for color mixtures the amounts of the 
colors sum additively is called Grass- 
man’s law. Asapplied to /uminances 
the law is called Abney’s law, and is 
probably nearly correct for appropri- 
ate conditions of measurement. 

The data of color mixture, i.e., the 
combinations of quantities of pri- 
maries required to match mono- 
chromatic spectral colors, are given 
by the spectral mixture curves as well 
as by the chromaticity coordinates 
for which intensity of light is treated 
as constant. 

It has been suggested that the 
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negative values applying to the 
“‘desaturating’’ primary may repre- 
sent inhibitory neural effects. In 
other cases it has been thought that 
the negative values are measures of 
desaturation produced by overlap- 
ping of fundamental curves repre- 
senting the spectral absorptions of 
the basic receptors. The possibility 
that the negative values may be due 
to the presence of more than three 
receptors or processes is usually ig- 
nored. 

The trichromatic theory implies 
that one set of a possible infinity of 
sets of primaries should describe the 
data of color mixture. We know to- 
day that three primaries, properly 
chosen, can do this and that the facts 
of color mixture can be satisfied by a 
theory that assumes more than three, 
four (Hurvich & Jamison, 1955), or 
seven (Hartridge, 1950). It should be 
recognized that color mixture alone 
cannot provide the foundation stone 
of the theory. 

It is occasionally admitted by sup- 
porters of the trichromatic theory, 
that the differences in spectral satura- 
tion are difficult to explain with the 
theoretical argument concerning 
the ‘‘white-producing’’ ordinates 
(Wright, 1946). Most of the pub- 
lished fundamental response curves 
show only two receptor curves from 
about 530 my to 700 my (Wright, 
1946). Thus the conditions necessary 


for white are missing here and the fact’ 


that a yellow is the least saturated of 
spectral colors unexplained. 

The diminishing-returns hypothe- 
sis extended by Helmholtz cannot ex- 
plain the  Bezold-Briicke _phe- 
nomenon, i.e., the fact that changes 
in hue occur with changes in lumi- 
nance, nor does it lead toa satisfactory 
explanation of changes in spectral 
saturation with changes in lumi- 
nance. According to the hypothesis, 
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saturation should progressively de- 
crease with increases in luminance 
and hues should be maximally satu- 
rated at threshold. As noted pre- 
viously, the experimental findings are 
that as luminance increases from 
absolute threshold to a high value, 
the saturation first increases to a 
maximum and then decreases (Purdy, 
1931). 

It is obvious that the assumption 
of diminishing returns cannot be 
reconciled with the assumption of ad- 
ditivity of luminosities (Abney’s law). 
The latter hypothesis, extended to 
account for spectral brightness differ- 
ences, necessitates a fixed ratio of the 
three response ordinates for all lumi- 
nance levels. 

The experimental data from anom- 
alous trichromats are meagre, con- 
troversial, and offer difficulties to the 
theorists. Some workers attribute 
anomalous trichromasy to varying 
degrees of defect in either the red or 
green fundamental system (Pitt, 
1949; Wright, 1946) and others fall 
back on the shift-theory or its modifi- 
cations to explain it (Abney & Wat- 
son, 1913; Pitt, 1935, Appendix II). 
Nelson (1938) and Pitt (1949) suggest 
that deuteranomaly may sometimes 
be due to a red response curve that 
exceeds the height of the normal red 
curve. 

After more than a century of scien- 
tific research in color vision the tri- 
chromatic theory continues to face 
theoretical contradictions and unex- 
plained facts. Trichromatic theorists 
everywhere should rigorously ex- 
amine the theory’s basic assumptions, 
provide much 


more experimental 


data on the basic visual functions, 


and honestly ask themselves the ques- 
tion: should the theory be subject to 
drastic revision or should 
placed by some other theory? 


it be re- 
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